Machine learning
Machine learning
Answer:
Implementation Steps:
* Read CSV Data: Use Python’s pandas library to read the training data from
a specified .csv file. The CSV should have columns representing features and
the last column representing the class label (e.g., ‘Yes’ for positive, ‘No’ for
negative).
* If the hypothesis is still in its initial None state, set the hypothesis to the
feature values of this first positive example.
* If the hypothesis has been set, compare each feature value of the current
positive example with the corresponding value in the current hypothesis:
Demonstration:
Color,Shape,Size,Class
Red,Circle,Small,Yes
Red,Square,Large,Yes
Red,Circle,Large,Yes
Blue,Circle,Small,No
Python Code:
Import pandas as pd
Def find_s_algorithm(file_path):
Df = pd.read_csv(file_path)
Num_attributes = len(features[0])
If labels[i] == ‘Yes’:
If hypothesis[0] is None:
Hypothesis = list(example)
Else:
For j in range(num_attributes):
If hypothesis[j] != example[j]:
Hypothesis[j] = ‘?’
Return hypothesis
# Demonstrate
File = ‘finds_data.csv’
Specific_hypothesis = find_s_algorithm(file)
LAB EXPERIMENT : 2
Question: For a given set of training data examples stored in a .CSV
file, implement and demonstrate the Candidate-Elimination
algorithm to output a description of the set of all hypotheses
consistent with the training examples.
Answer:
Implementation Steps:
* Initialize Boundaries:
* Initialize the specific boundary S with the most specific hypothesis: [[‘∅’,
‘∅’, ...]] (where ‘∅’ represents any value).
* Initialize the general boundary G with the most general hypothesis: [[‘?’,
‘?’, ...]].
* Positive Example:
* Negative Example:
Demonstration:
Sky,AirTemp,Humidity,Wind,Water,Forecast,EnjoySport
Sunny,Warm,Normal,Strong,Warm,Same,Yes
Sunny,Warm,High,Strong,Warm,Same,Yes
Rainy,Cold,High,Strong,Warm,Change,No
Sunny,Warm,High,Weak,Warm,Same,Yes
Import pandas as pd
Def candidate_elimination(file_path):
Df = pd.read_csv(file_path)
Num_attributes = len(features[0])
S = [[‘∅’] * num_attributes]
G = [[‘?’] * num_attributes]
Label = labels[i]
If label == ‘Yes’:
S_next = []
For s in S:
For h in generalizations:
Else:
S_next.append(s)
G_next = []
For g in G:
If is_consistent(g, example):
For h in specializations:
Else:
G_next.append(g)
Return S, G
LAB EXPERIMENT : 3
Question: Write a program to demonstrate the working of the
decision tree based ID3 algorithm. Use an appropriate data set for
building the decision tree and apply this knowledge to classify a
new sample.
Answer:
Implementation Steps:
* Base Cases: If all examples have the same class or no attributes left,
return a leaf node.
* Recursive Step:
* Select the attribute with the highest information gain as the splitting
attribute.
* For each value of the splitting attribute, create a branch and recursively
call the ID3 function on the subset of data corresponding to that value
(excluding the splitting attribute).
* Classify New Sample: Traverse the tree based on the attribute values of
the new sample to reach a leaf node, which gives the classification.
Demonstration:
Outlook,Temperature,Humidity,Windy,PlayTennis
Sunny,Hot,High,False,No
Sunny,Hot,High,True,No
Overcast,Hot,High,False,Yes
Rainy,Mild,High,False,Yes
Rainy,Cool,Normal,False,Yes
Rainy,Cool,Normal,True,No
Overcast,Cool,Normal,True,Yes
Sunny,Mild,High,False,No
Sunny,Cool,Normal,False,Yes
Rainy,Mild,Normal,False,Yes
Sunny,Mild,Normal,True,Yes
Overcast,Mild,High,True,Yes
Overcast,Hot,Normal,False,Yes
Rainy,Mild,High,True,No
Import pandas as pd
Import math
Def entropy(data):
File = ‘id3_data.csv’
Df = pd.read_csv(file)
Target = ‘PlayTennis’
LAB EXPERIMENT : 4
Question: Build an Artificial Neural Network by implementing the
Backpropagation algorithm and test the same using appropriate
data sets.
Answer:
Implementation Steps:
* Calculate Error: Compute the error between the network’s output and the
target output using a loss function (e.g., Mean Squared Error).
* Update Weights and Biases: Adjust the weights and biases in the direction
that reduces the error, using a learning rate.
Demonstration:
Import numpy as np
Def sigmoid(x):
Return 1 / (1 + np.exp(-x))
Def sigmoid_derivative(x):
Return x * (1 – x)
Class NeuralNetwork:
Def update_weights(self):
# Demonstrate
Answer:
Implementation Steps:
* Read CSV Data: Use pandas to load training and testing data.
* Separate Features and Class: Divide data into features and target variable.
* Calculate Class Probabilities: For each class, calculate its prior probability.
* Compute Accuracy: Compare the predicted classes with the actual classes
in the test set.
Demonstration:
# train_nb.csv
Color,Shape,Class
Red,Circle,Positive
Blue,Square,Negative
Red,Square,Positive
Blue,Circle,Negative
# test_nb.csv
Color,Shape,Class
Red,Square,Positive
Blue,Circle,Negative
Import pandas as pd
Def naive_bayes_train(train_df):
# Demonstrate
Train_file = ‘train_nb.csv’
Test_file = ‘test_nb.csv’
Train_df = pd.read_csv(train_file)
Test_df = pd.read_csv(test_file)
Model = naive_bayes_train(train_df)
Predictions = []
Predictions.append(prediction)
Actual_classes = test_df[‘Class’].tolist()
Print(“Accuracy:”, acc)
Print(“Predictions:”, predictions)
LAB EXPERIMENT : 6
Question: Assuming a set of documents that need to be classified,
use the naïve Bayesian Classifier model to perform this task. Built-in
Java classes/API can be used to write the program. Calculate the
accuracy, precision, and recall for your data set.
Answer:
* Train Model:
* Evaluate:
Import opennlp.tools.doccat.*;
Import opennlp.tools.tokenize.*;
Import opennlp.tools.util.*;
Import java.io.*;
Import java.nio.charset.StandardCharsets;
Import java.util.Arrays;
Public class DocumentClassifier {
ObjectStream<DocumentSample> sampleStream =
getDocumentSamples(“path/to/training/data”);
Answer:
Import pandas as pd
Try:
Except FileNotFoundError:
Print(“Error: ‘heart.csv’ not found. Please make sure the file is in the
correct directory.”)
Exit()
Le = LabelEncoder()
Data[col] = le.fit_transform(data[col])
(‘Sex’, ‘HeartDisease’),
(‘ChestPainType’, ‘HeartDisease’),
(‘RestingBP_Group’, ‘HeartDisease’),
(‘Cholesterol_Group’, ‘HeartDisease’),
(‘FastingBS’, ‘HeartDisease’),
(‘RestingECG’, ‘HeartDisease’),
(‘MaxHR_Group’, ‘HeartDisease’),
(‘ExerciseAngina’, ‘HeartDisease’),
(‘Oldpeak_Group’, ‘HeartDisease’),
(‘ST_Slope’, ‘HeartDisease’)])
Model.fit(data_discrete, estimator=estimator)
Print(cpd)
Infer = VariableElimination(model)
Q = infer.query(variables=[‘HeartDisease’],
Evidence={‘Age_Group’: ‘Old’,
‘RestingBP_Group’: ‘Elevated’,
‘Cholesterol_Group’: ‘High’,
‘FastingBS’: 1,
‘RestingECG’: 0,
‘MaxHR_Group’: ‘Average’,
‘ExerciseAngina’: 1,
‘Oldpeak_Group’: ‘Medium’,
Print(q)
Explanation:
* Load Data: The code assumes you have a CSV file named heart.csv
containing the heart disease dataset. You’ll need to replace this with the
actual path to your dataset.
* Preprocess Data:
* Verify Model: We print the edges of the network and the learned CPDs to
understand the model.
Install Libraries:
The output will show the structure of the learned Bayesian network and the
probability of heart disease given the specified evidence.
LAB EXPERIMENT : 8
Answer:
Import pandas as pd
Try:
except FileNotFoundError:
exit()
numeric_data = data.select_dtypes(include=['number'])
if numeric_data.empty:
print("Error: No numeric columns found in the CSV file for clustering.")
exit()
scaler = StandardScaler()
scaled_data = scaler.fit_transform(numeric_data)
gmm_labels = gmm.fit_predict(scaled_data)
# Evaluate EM clustering
scaled_df['EM_Cluster'] = gmm_labels
kmeans_labels = kmeans.fit_predict(scaled_data)
scaled_df['KMeans_Cluster'] = kmeans_labels
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.xlabel(scaled_df.columns[0])
plt.ylabel(scaled_df.columns[1])
plt.colorbar(scatter, label='Cluster')
plt.subplot(1, 2, 2)
plt.ylabel(scaled_df.columns[1])
plt.colorbar(scatter, label='Cluster')
plt.tight_layout()
plt.show()
else:
print("\nNote: Cannot generate scatter plot as the data has less than 2
features.")
else:
print("- Cluster shapes and sizes: EM can handle clusters with different
shapes and sizes due to the covariance matrices it learns, while k-Means
struggles with non-spherical or differently sized clusters.")
Explanation:
* Load Data: The code assumes your data is in a CSV file named
clustering_data.csv. Replace this with the actual file name.
* Handle Non-Numeric Data: The code selects only numeric columns for
clustering as both EM and k-Means typically operate on numerical data.
* fit_predict() learns the Gaussian mixture model from the data and assigns
each data point to a cluster.
* We evaluate the clustering using the Silhouette Score, which measures
how well each data point fits into its assigned cluster compared to other
clusters. A higher score (closer to +1) indicates better clustering.
* k-Means Algorithm:
* We initialize a KMeans model with the same number of clusters for a fair
comparison. n_init is set to improve the stability of the algorithm by running
it multiple times with different initial centroid seeds.
* Compare Results:
* Install Libraries:
* Create Data File: Create a CSV file named clustering_data.csv (or whatever
name you use in the code) with the data you want to cluster. Ensure it has
numeric columns.
The output will show the Silhouette Scores for both EM and k-Means
clustering, a scatter plot (if the data has at least two features), and
comments comparing the results and the algorithms.