(ML) Machine Learning Lab Manual
(ML) Machine Learning Lab Manual
no Experiments Dates
1. Implement and demonstrate the FIND-S algorithm for finding the most specific 20/1/23
hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.
2. 10/2/23
Python Implementation of Candidate-Elimination
3. Write s program to demonstrate working of decision tree based ID3 algorithm 17/2/23
4. Exercises to solve the real world problems using the following machine learning methods. 3/3/23
• Linear Regression
• Logistic Regression
• Binary classifier
9. Write a program to implement k-means algorithm to classify the iris dataset print 19/4/23
both correct and wrong predictions
10. Write a program to implement principle component analysis 19/4/23
WEEK – 1: EXPERIMENT 1:
1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Training Examples:
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes
Program:
import csv
num_attributes = 6
a = []
print("\n The Given Training Data Set \n")
with open('enjoysport.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{0} the hypothesis is
".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training
Examples :\n")
print(hypothesis)
Data Set:
sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes
Output:
The Given Training Data Set
['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']
['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']
['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']
['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']
The initial value of hypothesis:
['0', '0', '0', '0', '0', '0']
Find S: Finding a Maximally Specific Hypothesis
For Training Example No:0 the hypothesis is
['sunny', 'warm', 'normal', 'strong', 'warm', 'same']
For Training Example No:1 the hypothesis is
['sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training Example No:2 the hypothesis is
'sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training Example No:3 the hypothesis is
'sunny', 'warm', '?', 'strong', '?', '?']
The Maximally Specific Hypothesis for a given Training Examples:
['sunny', 'warm', '?', 'strong', '?', '?']
WEEK-2: EXPERIMENT 2 :
AIM: Python Implementation of Candidate-Elimination
sky, air temp, humidity, wind, water, for cast, enjoy sport.
sunny, warm, normal, strong, ,warm, same, yes.
sunny, warm, high, strong, warm, same, yes.
rainy, cold, high, strong, warm, change, no.
sunny, warm, high, strong, cool, change, yes.
program
import numpy as np
import pandas as pd
# Reading the data from CSV file
data = pd.read_csv('candidate.csv')
concepts = np.array(data.iloc[:,:-1])
print("\nInstances are:\n",concepts)
target = np.array(data.iloc[:,-1])
print("\nTarget Values are: ",target)
def train(concepts, target):
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
WEEK-3: EXPERIMENT 3:
AIM: Write s program to demonstrate working of decision tree based ID3 algorithm
Day Outlook Temperature Humidity Wind PlayTennis
Program:
import pandas as pd
import math
import numpy as np
data = pd.read_csv("dataset.csv")
features = [feat for feat in data]
features.remove("answer")
class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""
def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
else:
p = pos / (pos + neg)
n = neg / (pos + neg)
return -(p * math.log(p, 2) + n * math.log(n, 2))
max_gain = 0
max_feat = ""
for feature in attrs:
#print ("\n",examples)
gain = info_gain(examples, feature)
if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
#print ("\nMax feature attr",max_feat)
uniq = np.unique(examples[max_feat])
#print ("\n",uniq)
for u in uniq:
#print ("\n",u)
subdata = examples[examples[max_feat] == u]
#print ("\n",subdata)
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf = True
newNode.value = u
newNode.pred = np.unique(subdata["answer"])
root.children.append(newNode)
else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root
def printTree(root: Node, depth=0):
for i in range(depth):
print("\t", end="")
print(root.value, end="")
if root.isLeaf:
print(" -> ", root.pred)
print()
for child in root.children:
printTree(child, depth + 1)
def classify(root: Node, new):
for child in root.children:
if child.value == new[root.value]:
if child.isLeaf:
print ("Predicted Label for new example", new," is:", child.pred)
exit
else:
classify (child.children[0], new)
root = ID3(data, features)
print("Decision Tree is:")
printTree(root)
print ("------------------")
Week 4 : Experiment 4:
Aim: Exercises to solve the real world problems using the following machine
learning methods.
• Linear Regression
• Logistic Regression
• Binary classifier
Program:
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
dataset = load_breast_cancer(as_frame=True)
dataset['data'].head()
dataset['target'].head()
dataset['target'].value_counts()
X = dataset['data']
y = dataset['target']
from sklearn.model_selection import train_test_split
ss_train = StandardScaler()
X_train = ss_train.fit_transform(X_train)
ss_test = StandardScaler()
X_test = ss_test.fit_transform(X_test)
predictions = model.predict(X_test)
models = {}
from Sklearn.linear_model import LogisticRegression
models(‘logistic Regression’) = logistic Regression()
from Sklearn.svm import Linearsvc
models(‘Support Vector Machine’) = Linearsvc()
from Sklearn.tree import DecisionTreeclassifier
models(‘Decision Trees’) = DecisionTreeclassifier ()
from Sklearn.ensemble import RandomForestclassifier
models(‘Random Forest’) = RandomForestclassifier ()
from Sklearn.neighbors import kneighborsclassifier
models(‘k_Nearest neighbors’) = kneighborsclassifier()
from Sklearn.metrics import accuracy_score,precision_score,recall_score
accuracy,precision,recall={}{}{}
Output:
Accuracy precision recall
Logistic Regression 0.95804 0.955556 0.977273
Support Vector 0.937063 0.93333 0.965517
Machine
Decision tree 0.881119 0.84444 0.962025
Random Forest 0.965035 0.95556 0.988506
Naïve Bayes 0.937063 0.95556 0.945055
K-Nearest neighbor 0.951049 0.98889 0.936842
Week 5 : Experiment 5:
AIM: Develop a program for Bias, Variance, Remove duplicates , Cross Validation
Program:
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from mlxtend.evaluate import bias_variance_decomp
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
# separate into inputs and Outputs
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
# define the model
model = LinearRegression()
# estimate bias and variance
mse, bias, var = bias_variance_decomp(model, X_train, y_train, X_test, y_test, loss='mse',
num_rounds=200, random_seed=1)
# summarize results
print('MSE: %.3f' % mse)
print('Bias: %.3f' % bias)
print('Variance: %.3f' % var)
Output:
MSE: 22.418
BIAS: 20.744
VARAINCE: 1.674
Week 6: Experiment 6:
AIM: Build an Artificial Neural Networks by Implementing the Back
Propagation Algorithm and text the same using appropriate data sets
Program:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
data = load_iris()
X=data.data
y=data.target
y = pd.get_dummies(y).values
y[:3]
Output:
array([[1, 0, 0],
[1, 0, 0],
[1, 0, 0]], dtype=uint8)
Program:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20,
random_state=4)
learning_rate = 0.1
iterations = 5000
N = y_train.size
input_size = 4
hidden_size = 2
output_size = 3
results = pd.DataFrame(columns=["mse", "accuracy"])
np.random.seed(10)
W1 = np.random.normal(scale=0.5, size=(input_size, hidden_size))
W2 = np.random.normal(scale=0.5, size=(hidden_size , Output_size))
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def mean_squared_error(y_pred, y_true):
return ((y_pred - y_true)**2).sum() / (2*y_pred.size)
def accuracy(y_pred, y_true):
acc = y_pred.argmax(axis=1) == y_true.argmax(axis=1)
return acc.mean()
for itr in range(iterations):
Z1 = np.dot(x_train, W1)
A1 = sigmoid(Z1)
Z2 = np.dot(A1, W2)
A2 = sigmoid(Z2)
mse = mean_squared_error(A2, y_train)
acc = accuracy(A2, y_train)
results=results.append({"mse":mse, "accuracy":acc},ignore_index=True )
E1 = A2 - y_train
dW1 = E1 * A2 * (1 - A2)
E2 = np.dot(dW1, W2.T)
dW2 = E2 * A1 * (1 - A1)
W2_update = np.dot(A1.T, dW1) / N
W1_update = np.dot(x_train.T, dW2) / N
W2 = W2 - learning_rate * W2_update
W1 = W1 - learning_rate * W1_update
Output:
Program:
# feedforward
Z1 = np.dot(x_test, W1)
A1 = sigmoid(Z1)
Z2 = np.dot(A1, W2)
A2 = sigmoid(Z2)
Output:
Accuracy: 0.8
Week 7: Experiment 7:
Aim: Write a program to implement categorical Encoding. One-Hot Encoding
Program:
Import numpy as np
Colors = [“red”,”green”,”yellow”,”red”,”blue”]
total_colors=[“red”,”green”,”blue”,”black”,”yellow”]
mapping={}
for X in range (len(total_colors)):
mapping[total_colors[x]]=x
one_hot_encide=[]
for C in colors:
arr=list(np.zeros(len(total_colors),dtype=int))
arr[Mapping[c]]=1
one_hot_encode.append(a11)
print(one_hot_encode)
Output:
[[1,0,0,0,0],[0,1,0,0,0],[0,0,0,0,1],[1,0,0,0,0],[0,0,1,0,0]]
Week 8: Experiment 8:
Aim: Write a program to Implement support vector machine
Dataset:
Index user ID Gender Age Estimated salary purchased
0 15624510 Male 19 19000 0
1 15810944 Male 35 20000 0
2 15668575 Female 26 43000 0
3 15603246 Female 27 57000 0
4 15804002 Male 19 76000 0
Program:
Import numpy as nm
Import matplotlib.pyplot as mtp
Import pandas as pd
Data_set= pd.read_csv(‘user_data.csv’)
x=data_set.iloc[:[2,3]].values
y= data_set.iloc[:,4].values
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=
train_test_split(x,y,test_size=0.25,random_state=0)
from sklearn.preprocessing import standardscale()
st_x= Standardscaler()
x_train = st_x.fit_transform(x_train)
x_test = st_x.fit_transform(x_test)
x_test y_test
0 1 0
0 – 0.804802 0.504964 0 0
1 – 0.125441 - 0.567752 1 0
2 – 0.804802 0.273019 2 0
3 – 0.804802 0.273019 3 0
4 – 0.309641 - 0.567782 4 0
Output:
Out[8]:
SVC(c=1.0,cache_size=200,class
weigh=None,coef=0.0,decision_function_shape=‘ovr’,degree=3,gamma=
‘auto_department’, kernal = ‘linear’, max_iter= -1, probability = false,
random_state=0,shrinking=true, tol = 0.001, verbose = flase
y_pred = classifier.predict(x_test)
Output:
Y_pred:
0
0 0
1 0
2 0
3 0
4 0
Output:
Cm:
0 1
0 66 2
1 8 24
Output:
Output:
Week 9: Experiment 9:
Aim: Write a program to implement k-means algorithm to classify the iris
dataset print both correct and wrong predictions.
Dataset:
Index Customer ID Gender Age Annual income Spending Score
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
Program:
import numpy as np
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv(‘ Mall_Customers_data.csv’)
x = dataset.cluster import kmeans
wcss_list = []
for i in range(1,11):
kmeans = kmeans(n_clusters = i,init=‘k-means++’,random_state=40)
kmeans.fit(x)
wcss_list.append(kmeans.inertia)
mtp.plot(range(1,11),wcss_list)
mtp.title(‘the eblow method graph’)
mtp.xlabel(‘Number of clusters(k)’)
mtp.ylabel(‘wcss_list’)
mtp.show()
Output:
Wcss_list:
Output:
Experiment 10:
Aim: Write a program to implement priniciple component analysis
Dataset:
Win Aloch Malic Ash ACI Mg Phenols Flavnoid Non pronth Color.int
e ol acid Flavour
1 14.23 1.71 2.43 15.6 127 2.8 3.06 0.28 2.29 5.64
2 13.2 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38
1 13.16 2.36 2.67 18.6 101 2.8 3.24 0.3 2.81 5.68
1 14.37 1.95 2.5 16.8 113 3.85 3.49 0.24 2.18 7.8
1 13.24 2.59 2.87 21 118 2.8 2.69 0.39 1.82 4.32
Program:
Import numpy as np
Import matplotlib.pyplot as plt
Import pandas as pd
Dataset = pd.read_csv(‘wine.csv’)
x = dataset.iloc[:,0:13].values
y = dataset.iloc[:,13].values
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=,random_state=0)
from sklearn.preprocessing import standardscalar
sc= standardscalar()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
from sklearn.decomposition import pca
pca = pca(n_components = 2)
x_train = pca.fit_transform(x_train)
x_test = pca.transform(x_test)
explained_variance = pca.explained_varaiance_ratio
from sklearn.linear_model import Logistic regression
classifier = Logistic regression(random_state=0)
classifier.fit(x_train,y_train)
Output:
LogisticRegression(random_state=0)
Y_pred = classifier.predict(x_test)
From sklearn.metrics import confusion_matrix
Cm = confusion_matrix(y_test,y_pred)
From matplotlib.colors import listedcolormap
X_set,y_set = x_train,y_train
X1,X2 =
np.meshgrid(np.arrange(start=x_set[:,0],min()_1,stop=x_step[;,0],max()+1,step=0.0
1)np. Arrange(start=x_set[:,1].min()_1,stop=x_set[:,1],max()+1,step=0.01))
Plt.contourf(x1,x2,classifier.predict(np.array([x1.ravel(),x1.ravel()]).t).reshape(x1.sh
ape),alpha=0.75,cmap=Listedcolormap((‘yellow’,‘White’,‘aquamarine’)))
<matplotlib.contour.QuadContourset object at 0X0000025CDA2F70D>
Plt.xlim(x1.min(),x1.max())
(-4.79510714681534,5.0748928530844405)
Plt.ylim(x2.min(),x2.max())
(-4.576825766462532,4.753174233537069)
For i,j in enumerate(np.unique(y_set)):
Plt.title(‘Logistic Regression(training set)’)
Text(0.5,1.0,‘ Logistic Regression(training set)’)
Plt.xlabel(‘pc1’)
Text(0.5,0,‘ pc1’)
Plt.ylabel(‘pc2’)
Text(0,0.5,‘ pc2’)
Plt.show()
Output: