0% found this document useful (0 votes)
26 views

(ML) Machine Learning Lab Manual

Uploaded by

rtnova234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

(ML) Machine Learning Lab Manual

Uploaded by

rtnova234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

S.

no Experiments Dates
1. Implement and demonstrate the FIND-S algorithm for finding the most specific 20/1/23
hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.
2. 10/2/23
Python Implementation of Candidate-Elimination
3. Write s program to demonstrate working of decision tree based ID3 algorithm 17/2/23

4. Exercises to solve the real world problems using the following machine learning methods. 3/3/23

• Linear Regression
• Logistic Regression
• Binary classifier

5. Develop a program for Bias, Variance, Remove duplicates , Cross 17/3/23


Validation
6. Build an Artificial Neural Networks by Implementing the Back Propagation Algorithm and 17/3/23
text the same using appropriate data sets
7. Write a program to implement categorical Encoding. One-Hot Encoding 24/3/23

8. Write a program to Implement support vector machine 31/3/23

9. Write a program to implement k-means algorithm to classify the iris dataset print 19/4/23
both correct and wrong predictions
10. Write a program to implement principle component analysis 19/4/23

WEEK – 1: EXPERIMENT 1:
1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Training Examples:
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

Program:
import csv
num_attributes = 6
a = []
print("\n The Given Training Data Set \n")
with open('enjoysport.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
print("\n Find S: Finding a Maximally Specific Hypothesis\n")
for i in range(0,len(a)):
if a[i][num_attributes]=='yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{0} the hypothesis is
".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given Training
Examples :\n")
print(hypothesis)
Data Set:
sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes
Output:
The Given Training Data Set
['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']
['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']
['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']
['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']
The initial value of hypothesis:
['0', '0', '0', '0', '0', '0']
Find S: Finding a Maximally Specific Hypothesis
For Training Example No:0 the hypothesis is
['sunny', 'warm', 'normal', 'strong', 'warm', 'same']
For Training Example No:1 the hypothesis is
['sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training Example No:2 the hypothesis is
'sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training Example No:3 the hypothesis is
'sunny', 'warm', '?', 'strong', '?', '?']
The Maximally Specific Hypothesis for a given Training Examples:
['sunny', 'warm', '?', 'strong', '?', '?']

WEEK-2: EXPERIMENT 2 :
AIM: Python Implementation of Candidate-Elimination

Below is the algorithm for Candidate-Elimination

● Firstly, read the data from the CSV file.


● Initialize General and Specific Hypothesis.
● If the example is positive, [Follow Find-S algorithm]
● If attribute == hypothesis value then do nothing.
Else
● make the attribute more general i.e replace the attribute with ?

● If the example is negative


● Make the generalized hypothesis more specific.

Below is the code for Candidate-Elimination


Contents in candidate.csv

sky, air temp, humidity, wind, water, for cast, enjoy sport.
sunny, warm, normal, strong, ,warm, same, yes.
sunny, warm, high, strong, warm, same, yes.
rainy, cold, high, strong, warm, change, no.
sunny, warm, high, strong, cool, change, yes.

program
import numpy as np
import pandas as pd
# Reading the data from CSV file
data = pd.read_csv('candidate.csv')
concepts = np.array(data.iloc[:,:-1])
print("\nInstances are:\n",concepts)
target = np.array(data.iloc[:,-1])
print("\nTarget Values are: ",target)
def train(concepts, target):

# Initializing general and specific hypothesis


specific_h = concepts[0].copy()
print("\nInitialization of specific hypothesis and general hypothesis")
print("\nSpecific Boundary: ", specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print("\nGeneric Boundary: ",general_h)

for i, val in enumerate(concepts):


print("\nInstance", i+1 , "is ", val)
#positive example
if target[i] == "yes":
print("Instance is Positive ")
for x in range(len(specific_h)):
if val[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
#negative example
if target[i] == "no":
print("Instance is Negative ")
for x in range(len(specific_h)):
if val[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'

print("Specific Bundary after ", i+1, "Instance is ", specific_h)


print("Generic Boundary after ", i+1, "Instance is ", general_h)
print("\n")

indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]

for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])

return specific_h, general_h


s_final, g_final = train(concepts, target)
# displaying Specific_hypothesis
print("Final Specific_h: ", s_final, sep="\n")
# displaying Generalized_Hypothesis
print("Final General_h: ", g_final, sep="\n")
OUTPUT:
Instances are:
[['\twarm' '\tnormal' '\tstrong' '\t' 'warm' '\tsame']
['\twarm' '\thigh' '\tstrong' '\twarm' '\tsame' '\tyes.']
['\tcold' '\thigh' '\tstrong' '\twarm' '\tchange' '\tno.']
['\twarm' '\thigh' '\tstrong' '\tcool' '\tchange' '\tyes.']]
Target Values are: ['\tyes.' nan nan nan]
Initialization of specific hypothesis and general hypothesis
Specific Boundary: ['\twarm' '\tnormal' '\tstrong' '\t' 'warm' '\tsame']
Generic Boundary: [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Instance 1 is ['\twarm' '\tnormal' '\tstrong' '\t' 'warm' '\tsame']
Specific Bundary after 1 Instance is ['\twarm' '\tnormal' '\tstrong' '\t' 'warm' '\tsame']
Generic Boundary after 1 Instance is [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Instance 2 is ['\twarm' '\thigh' '\tstrong' '\twarm' '\tsame' '\tyes.']
Specific Bundary after 2 Instance is ['\twarm' '\tnormal' '\tstrong' '\t' 'warm' '\tsame']
Generic Boundary after 2 Instance is [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Instance 3 is ['\tcold' '\thigh' '\tstrong' '\twarm' '\tchange' '\tno.']
Specific Bundary after 3 Instance is ['\twarm' '\tnormal' '\tstrong' '\t' 'warm' '\tsame']
Generic Boundary after 3 Instance is [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Instance 4 is ['\twarm' '\thigh' '\tstrong' '\tcool' '\tchange' '\tyes.']


Specific Bundary after 4 Instance is ['\twarm' '\tnormal' '\tstrong' '\t' 'warm' '\tsame']
Generic Boundary after 4 Instance is [['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['\twarm' '\tnormal' '\tstrong' '\t' 'warm' '\tsame']
Final General_h:
[]

WEEK-3: EXPERIMENT 3:
AIM: Write s program to demonstrate working of decision tree based ID3 algorithm
Day Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No

Program:
import pandas as pd
import math
import numpy as np
data = pd.read_csv("dataset.csv")
features = [feat for feat in data]
features.remove("answer")
class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""
def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
else:
p = pos / (pos + neg)
n = neg / (pos + neg)
return -(p * math.log(p, 2) + n * math.log(n, 2))

def info_gain(examples, attr):


uniq = np.unique(examples[attr])
#print ("\n",uniq)
gain = entropy(examples)
#print ("\n",gain)
for u in uniq:
subdata = examples[examples[attr] == u]
#print ("\n",subdata)
sub_e = entropy(subdata)
gain -= (float(len(subdata)) / float(len(examples))) * sub_e
#print ("\n",gain)
sub_e = entropy(subdata)
gain -= (float(len(subdata)) / float(len(examples))) * sub_e
#print ("\n",gain)
return gain
def ID3(examples, attrs):
root = Node()

max_gain = 0
max_feat = ""
for feature in attrs:
#print ("\n",examples)
gain = info_gain(examples, feature)
if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
#print ("\nMax feature attr",max_feat)
uniq = np.unique(examples[max_feat])
#print ("\n",uniq)
for u in uniq:
#print ("\n",u)
subdata = examples[examples[max_feat] == u]
#print ("\n",subdata)
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf = True
newNode.value = u
newNode.pred = np.unique(subdata["answer"])
root.children.append(newNode)
else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)

return root
def printTree(root: Node, depth=0):
for i in range(depth):
print("\t", end="")
print(root.value, end="")
if root.isLeaf:
print(" -> ", root.pred)
print()
for child in root.children:
printTree(child, depth + 1)
def classify(root: Node, new):
for child in root.children:
if child.value == new[root.value]:
if child.isLeaf:
print ("Predicted Label for new example", new," is:", child.pred)
exit
else:
classify (child.children[0], new)
root = ID3(data, features)
print("Decision Tree is:")
printTree(root)
print ("------------------")

new = {"outlook":"sunny", "temperature":"hot", "humidity":"normal", "wind":"strong"}


classify (root, new)

Week 4 : Experiment 4:
Aim: Exercises to solve the real world problems using the following machine
learning methods.

• Linear Regression
• Logistic Regression
• Binary classifier

Program:
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
dataset = load_breast_cancer(as_frame=True)
dataset['data'].head()
dataset['target'].head()
dataset['target'].value_counts()
X = dataset['data']
y = dataset['target']
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y , test_size=0.25, random_state=0)

from sklearn.preprocessing import StandardScaler

ss_train = StandardScaler()
X_train = ss_train.fit_transform(X_train)
ss_test = StandardScaler()
X_test = ss_test.fit_transform(X_test)

predictions = model.predict(X_test)
models = {}
from Sklearn.linear_model import LogisticRegression
models(‘logistic Regression’) = logistic Regression()
from Sklearn.svm import Linearsvc
models(‘Support Vector Machine’) = Linearsvc()
from Sklearn.tree import DecisionTreeclassifier
models(‘Decision Trees’) = DecisionTreeclassifier ()
from Sklearn.ensemble import RandomForestclassifier
models(‘Random Forest’) = RandomForestclassifier ()
from Sklearn.neighbors import kneighborsclassifier
models(‘k_Nearest neighbors’) = kneighborsclassifier()
from Sklearn.metrics import accuracy_score,precision_score,recall_score
accuracy,precision,recall={}{}{}

for key in models.keys():


models[key].fit(x_train,y_train)
predictions = models[key].predict(x_test)
accuracy[key] = accuracy_score(predictions,y_test)
precision[key] = precision_score(predictions,y_test)
recall[key] = recall_score(predictions,y_test)
logistic Regression()
linear svc()
decisiontree classifier()
randomForestclassifier()
GaussianNB()
KNeighbourclassifier()
Import pandas as pd
df_model = pd.dataframe(index = models.keys(),columns =
[‘Accuracy’,’precisioj’,’recall’])
df_model(‘Accuracy’) = accuracy.values()
df_model(‘precision’) = precision.values()
df_model(‘recall’) = recall.values()
df_model

Output:
Accuracy precision recall
Logistic Regression 0.95804 0.955556 0.977273
Support Vector 0.937063 0.93333 0.965517
Machine
Decision tree 0.881119 0.84444 0.962025
Random Forest 0.965035 0.95556 0.988506
Naïve Bayes 0.937063 0.95556 0.945055
K-Nearest neighbor 0.951049 0.98889 0.936842

Week 5 : Experiment 5:
AIM: Develop a program for Bias, Variance, Remove duplicates , Cross Validation

Program:
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from mlxtend.evaluate import bias_variance_decomp
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'
dataframe = read_csv(url, header=None)
# separate into inputs and Outputs
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)
# define the model
model = LinearRegression()
# estimate bias and variance
mse, bias, var = bias_variance_decomp(model, X_train, y_train, X_test, y_test, loss='mse',
num_rounds=200, random_seed=1)
# summarize results
print('MSE: %.3f' % mse)
print('Bias: %.3f' % bias)
print('Variance: %.3f' % var)

Output:
MSE: 22.418
BIAS: 20.744
VARAINCE: 1.674

Week 6: Experiment 6:
AIM: Build an Artificial Neural Networks by Implementing the Back
Propagation Algorithm and text the same using appropriate data sets
Program:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
data = load_iris()
X=data.data
y=data.target
y = pd.get_dummies(y).values
y[:3]

Output:

array([[1, 0, 0],

[1, 0, 0],
[1, 0, 0]], dtype=uint8)

Program:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20,
random_state=4)
learning_rate = 0.1
iterations = 5000
N = y_train.size
input_size = 4
hidden_size = 2
output_size = 3
results = pd.DataFrame(columns=["mse", "accuracy"])
np.random.seed(10)
W1 = np.random.normal(scale=0.5, size=(input_size, hidden_size))
W2 = np.random.normal(scale=0.5, size=(hidden_size , Output_size))
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def mean_squared_error(y_pred, y_true):
return ((y_pred - y_true)**2).sum() / (2*y_pred.size)
def accuracy(y_pred, y_true):
acc = y_pred.argmax(axis=1) == y_true.argmax(axis=1)
return acc.mean()
for itr in range(iterations):
Z1 = np.dot(x_train, W1)
A1 = sigmoid(Z1)
Z2 = np.dot(A1, W2)
A2 = sigmoid(Z2)
mse = mean_squared_error(A2, y_train)
acc = accuracy(A2, y_train)
results=results.append({"mse":mse, "accuracy":acc},ignore_index=True )
E1 = A2 - y_train
dW1 = E1 * A2 * (1 - A2)
E2 = np.dot(dW1, W2.T)
dW2 = E2 * A1 * (1 - A1)
W2_update = np.dot(A1.T, dW1) / N
W1_update = np.dot(x_train.T, dW2) / N
W2 = W2 - learning_rate * W2_update
W1 = W1 - learning_rate * W1_update

results.mse.plot(title="Mean Squared Error")


Output:
results.accuracy.plot(title="Accuracy")

Output:

Program:
# feedforward
Z1 = np.dot(x_test, W1)
A1 = sigmoid(Z1)

Z2 = np.dot(A1, W2)
A2 = sigmoid(Z2)

acc = accuracy(A2, y_test)


print("Accuracy: {}".format(acc))

Output:

Accuracy: 0.8

Week 7: Experiment 7:
Aim: Write a program to implement categorical Encoding. One-Hot Encoding
Program:
Import numpy as np
Colors = [“red”,”green”,”yellow”,”red”,”blue”]
total_colors=[“red”,”green”,”blue”,”black”,”yellow”]
mapping={}
for X in range (len(total_colors)):
mapping[total_colors[x]]=x
one_hot_encide=[]
for C in colors:
arr=list(np.zeros(len(total_colors),dtype=int))
arr[Mapping[c]]=1
one_hot_encode.append(a11)
print(one_hot_encode)

Output:
[[1,0,0,0,0],[0,1,0,0,0],[0,0,0,0,1],[1,0,0,0,0],[0,0,1,0,0]]

from Sklearn.preprocessing import label encoder


from Sklearn.preprocessing import onehot encoder
colors=([“red”,”green”,”yellow”,”red”,”blue”])
label_encoder=LabelEncoder()
integer_encoder=Label_encoder.fit_transform(colors)
print(integer_encoded)
[21320]
Integer_encoded = integer_encoded.reshape(len(integer_encoded),1)
One hot_encoder = one Hot Encoder(Sparse = false)
One hot_encoder = one Hot Encoder.fit_transform(integer_encoded)
Print(onehot_encoded)
Output:
[ [ 0.0.1.0.]
[ 0.1.0.0.]
[ 0.0.0.1.]
[ 0.0.1.0.]
[ 1.0.0.0.] ]

Week 8: Experiment 8:
Aim: Write a program to Implement support vector machine
Dataset:
Index user ID Gender Age Estimated salary purchased
0 15624510 Male 19 19000 0
1 15810944 Male 35 20000 0
2 15668575 Female 26 43000 0
3 15603246 Female 27 57000 0
4 15804002 Male 19 76000 0

Program:
Import numpy as nm
Import matplotlib.pyplot as mtp
Import pandas as pd
Data_set= pd.read_csv(‘user_data.csv’)
x=data_set.iloc[:[2,3]].values
y= data_set.iloc[:,4].values
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=
train_test_split(x,y,test_size=0.25,random_state=0)
from sklearn.preprocessing import standardscale()
st_x= Standardscaler()
x_train = st_x.fit_transform(x_train)
x_test = st_x.fit_transform(x_test)

x_test y_test
0 1 0
0 – 0.804802 0.504964 0 0
1 – 0.125441 - 0.567752 1 0
2 – 0.804802 0.273019 2 0
3 – 0.804802 0.273019 3 0
4 – 0.309641 - 0.567782 4 0

From sklearn.svm import svc


Classifier = svc(kernal = ‘linear’, randomstate = 0)
Classifier.fit(x_train,y_train)

Output:
Out[8]:
SVC(c=1.0,cache_size=200,class
weigh=None,coef=0.0,decision_function_shape=‘ovr’,degree=3,gamma=
‘auto_department’, kernal = ‘linear’, max_iter= -1, probability = false,
random_state=0,shrinking=true, tol = 0.001, verbose = flase
y_pred = classifier.predict(x_test)

Output:
Y_pred:
0

0 0
1 0
2 0
3 0
4 0

From sklearn.metrics import confusion_matrix


Cm = confusion_matrix(y_test,y_pred)

Output:
Cm:

0 1

0 66 2
1 8 24

From matplotlib.colors import listedcolormap


X_set,Y_set = X_train,Y_train
x1,x2 = nm.meshgrid(nn.arange(start = x_set[:,0].min_1,stop =
x_sset[:,0].max()+1,step=0.01)
nm.araige(start=x_set[: 1].min() – 1,stop: x_set[:,1].max()+1,step = 0.01))
mtp.contourf(x1,x2,classifier.predict(nm.array([x1.ravel(),x2.ravel()].7).reshap
e(x1.shape)
alpha=0.75,cmap=listedcolormap((‘red’, ‘green’)))
mtp.xlim(x1.mine(),x1.max())
mtp.ylim(x1.mine(),x1.max())
for I,j in enumerate(nm.uniquecy(y_set)):
mtp.scatter(x.set,y.set = = j,0],x_set[y_set==j,1],c=Listedcolormap( ‘red’,
‘green’)](1),label = j)
mtp.title(‘SVM classifier(Training set)’)
mtp.xlabel(‘age’)
mtp.ylabel(‘Estimated salary’)
mtp.Legend()
mtp.show()

Output:

From matplotlib.color import Listedcolormap


x_set,y_set = x_test,y_test
x1,x2 = nm.meshgrid(nm.arange(start = x_set[:,0].min()-1,
stop:x_set[:,0].max()+1,step = 0.01)
nm.arange(start=x.set[:,1].min()-1,stop=x_set[:,1].max()+1,step=0.01))
mtp.contour(x1,x2,classifier.predict(nm.array([x1ravel(),x2ravel)].T).reshape(x
1.shape)
alpha = 0.75,cmap=Listedcolormap((‘red’,‘green’))
mtp.xlim(x1.min(),x1.max())
mtp.ylim(x2.min(),x2.max())
for I,j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set = =j,0],x_set[y_set = =j,1]
c = Listedcolormap((‘red’,‘green’))(i),label = j)
mtp.tittle(‘svm classifier(test set)’)
mtp.xlabel(‘Age’)
mtp.ylabel(‘Estimated Salary’)
mtp.legent()
mtp.show()

Output:
Week 9: Experiment 9:
Aim: Write a program to implement k-means algorithm to classify the iris
dataset print both correct and wrong predictions.

Dataset:
Index Customer ID Gender Age Annual income Spending Score
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40

Program:
import numpy as np
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv(‘ Mall_Customers_data.csv’)
x = dataset.cluster import kmeans
wcss_list = []
for i in range(1,11):
kmeans = kmeans(n_clusters = i,init=‘k-means++’,random_state=40)
kmeans.fit(x)
wcss_list.append(kmeans.inertia)
mtp.plot(range(1,11),wcss_list)
mtp.title(‘the eblow method graph’)
mtp.xlabel(‘Number of clusters(k)’)
mtp.ylabel(‘wcss_list’)
mtp.show()

Output:

Wcss_list:

Index Type Size Value


0 float64 1 26991028
1 float64 1 106348.37306211118
2 float64 1 181363.595959596
3 float64 1 73679.78903948834
4 float64 1 4448.45544793371

Kmeans = kmeans(n_cluster = 5, init = ‘k_means++’. Random_state = 42)


y_predict = kmeans.fit_predict(x)
mtp.scatter(x[y]_predict == 0,0],x[y_predict == 0,1],s=100,c=
‘blue’,label=cluster1’)
mtp.scatter(x[y]_predict == 1,0],x[y_predict == 1,1],s=100,c=
‘green’,label=cluster2’)
mtp.scatter(x[y]_predict == 2,0],x[y_predict == 2,1],s=100,c=
‘red’,label=cluster3’)
mtp.scatter(x[y]_predict == 3,0],x[y_predict == 3,1],s=100,c=
‘cyan’,label=cluster4’)
mtp.scatter(x[y]_predict == 4,0],x[y_predict == 4,1],s=100,c=
‘magneta’,label=cluster5’)
mtp.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster.centers_[:,1],s=300,c=‘yell
ow’,label=‘centroid’)
mtp.tittle(‘cluster of customers’)
mtp.xlabel(‘Annual Income(k)’)
mtp.ylabel(‘spending score’)
mtp.legend()
mtp.show()

Output:

Experiment 10:
Aim: Write a program to implement priniciple component analysis
Dataset:
Win Aloch Malic Ash ACI Mg Phenols Flavnoid Non pronth Color.int
e ol acid Flavour
1 14.23 1.71 2.43 15.6 127 2.8 3.06 0.28 2.29 5.64
2 13.2 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38
1 13.16 2.36 2.67 18.6 101 2.8 3.24 0.3 2.81 5.68
1 14.37 1.95 2.5 16.8 113 3.85 3.49 0.24 2.18 7.8
1 13.24 2.59 2.87 21 118 2.8 2.69 0.39 1.82 4.32
Program:
Import numpy as np
Import matplotlib.pyplot as plt
Import pandas as pd
Dataset = pd.read_csv(‘wine.csv’)
x = dataset.iloc[:,0:13].values
y = dataset.iloc[:,13].values
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=,random_state=0)
from sklearn.preprocessing import standardscalar
sc= standardscalar()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
from sklearn.decomposition import pca
pca = pca(n_components = 2)
x_train = pca.fit_transform(x_train)
x_test = pca.transform(x_test)
explained_variance = pca.explained_varaiance_ratio
from sklearn.linear_model import Logistic regression
classifier = Logistic regression(random_state=0)
classifier.fit(x_train,y_train)

Output:
LogisticRegression(random_state=0)
Y_pred = classifier.predict(x_test)
From sklearn.metrics import confusion_matrix
Cm = confusion_matrix(y_test,y_pred)
From matplotlib.colors import listedcolormap
X_set,y_set = x_train,y_train
X1,X2 =
np.meshgrid(np.arrange(start=x_set[:,0],min()_1,stop=x_step[;,0],max()+1,step=0.0
1)np. Arrange(start=x_set[:,1].min()_1,stop=x_set[:,1],max()+1,step=0.01))
Plt.contourf(x1,x2,classifier.predict(np.array([x1.ravel(),x1.ravel()]).t).reshape(x1.sh
ape),alpha=0.75,cmap=Listedcolormap((‘yellow’,‘White’,‘aquamarine’)))
<matplotlib.contour.QuadContourset object at 0X0000025CDA2F70D>
Plt.xlim(x1.min(),x1.max())
(-4.79510714681534,5.0748928530844405)
Plt.ylim(x2.min(),x2.max())
(-4.576825766462532,4.753174233537069)
For i,j in enumerate(np.unique(y_set)):
Plt.title(‘Logistic Regression(training set)’)
Text(0.5,1.0,‘ Logistic Regression(training set)’)
Plt.xlabel(‘pc1’)
Text(0.5,0,‘ pc1’)
Plt.ylabel(‘pc2’)
Text(0,0.5,‘ pc2’)
Plt.show()

Output:

You might also like