0% found this document useful (0 votes)

15 views

Lec 17 -Dsfa23

Uploaded by

Malik Arslan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Lec 17 -Dsfa23

Uploaded by

Malik Arslan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

LECTURE 17

Classification
Building models of classification in sklearn

Data Science, Fall 2023 @ Knowledge Stream

Sana Jabbar

1
• Introduction to Classification
• Types of Classification
• Classification Algorithms
• Performance Metrics
• Overfitting and Underfitting
• Conclusion

Outline
Lecture 16

2
Classification
● Classification is defined as the process of recognition and grouping of objects
● Classification refers to a predictive modeling problem where a class label is
predicted for a given example of input data
● For Classification, the training dataset must be sufficiently representative of
the problem and have many examples of each class label.
1. Logistic Regression
2. k-Nearest Neighbors
3. Decision Trees
4. Support Vector Machine
5. Naive Bayes

3
Logistic Regression

4
Logistic Regression

5
K-Nearest Neighbor

• K nearest neighbors is a simple algorithm that stores all available

cases and classifies new cases based on a similarity measure.
• Algorithms for classification
1. Logistic Regression
2. k-Nearest Neighbor
3. Decision Trees
4. Support Vector Machine
5. Naive Bayes

6
Decision Trees (DTs)

• A hierarchical, tree structure, consists of a root node, branches,

internal nodes and leaf nodes
• Algorithms for binary classification
1. Logistic Regression
2. k-Nearest Neighbors
3. Decision Trees
4. Support Vector Machine
5. Naive Bayes

7
Support Vector Machine (SVM)

• SVM is used to classify data by finding the optimal decision

boundary that maximally separates different classes
• Algorithms for binary classification
1. Logistic Regression
2. k-Nearest Neighbors
3. Decision Trees
4. Support Vector Machine
5. Naive Bayes

8
Support Vector Machine (SVM)

● We have linear separable classes.

● We can have multiple hyperplanes separating these classed.

Q: Which one is the best decision boundary?

A: Maximum Margin Classifier (e.g., Support Vector Machine)
9
Support Vector Machine (SVM)

● Idea: Choose a fat separator

● The best boundary is the one that maximizes the margin or distance
between the boundary and the “difficult points” close to the
decision boundary.
● Margin pishes against the data points are called support vectors.
● Hard margin idea: Find a maximum margin
classifier with no errors on the training
data.
● Soft margin idea: Find the maximum
margin classifier while minimizing the
number of training errors.

10
Support Vector Machine (SVM)

● from sklearn.svm import SVC

svm_classifier = SVC(kernel='linear', C=1.0)
svm_classifier.fit(X_train, y_train)
y_pred = svm_classifier.predict(X_test)

You can choose different kernels (e.g., 'linear', 'poly', 'rbf') by

specifying the kernel parameter. The C parameter controls the trade-
off between maximizing the margin and minimizing the classification
error.

11
Naive Bayes

• The Naïve Bayes classifier is used for classification tasks, like text
classification.
• Algorithms for binary classification
1. Logistic Regression
2. k-Nearest Neighbors
3. Decision Trees
4. Support Vector Machine
5. Naive Bayes

12
Naive Bayes

● The Naive Bayes classifier is a family of probabilistic machine

learning models based on Bayes' theorem.
● The "naive" assumption is of feature independence.
● It's commonly used for text classification tasks, particularly for
document categorization and spam email filtering.
● Estimate from the data using the Bayes Theorem

13
Naive Bayes

Given Outlook, Temperature,

Humidity, and Wind
Information, we want to carry
out a prediction for Play: Yes or
No.

Mathematically

Predict for sunny outlook,

humidity high, cool temp, and
weak wind.

14
Naive Bayes

15
16
Naive Bayes

Play = No is more likely!

17
Performance Matrices

Confusion
Matrix

Classification

18
Performance Matrices

Confusion
Matrix

Recall

Classification
Classification
Model

19
Performance Matrices

Confusion
Matrix

Recall

Classification
Classification
Model

Precision

20
Performance Matrices

Confusion
Matrix

Recall

Classification
Classification
Model

Precision

F1-SCore
21
Performance Matrices

Confusion
Matrix

Recall

Classification
Classification
Model ROC curve is a graphical
representation of the
performance of the binary
Precision ROC-AUC classifier

F1-SCore
ROC Curve
● We get the data of 10 people about how high blood level is and whether
there is disease or not.

23
ROC Curve
● 5 individuals have the disease and 4 are classified correctly and 1 is
misclassified.
● The true positive rate (TPR) is 0.8

24
ROC Curve
● 5 individuals are healthy and 3 are classified correctly and 2 are
misclassified.
● The false positive rate (FPR) is 0.4

25
ROC Curve
● We calculate the value of TPR and FPR for different value of thresholds.

26
ROC Curve
● The true positive rate (TPR) is 1
● The false positive rate (FPR) is 1

27
ROC Curve
● The true positive rate (TPR) is 1
● The false positive rate (FPR) is 0.8

28
ROC Curve
● The true positive rate (TPR) is 0
● The false positive rate (FPR) is 0

29
ROC Curve

● For this point (0.2, 0.8)

● 80% of individuals are
correctly classified as
with diseases.
● 20% of indiviuals are
misclassified as with
diseases.

30
ROC Curve

● from sklearn.metrics import roc_curve

● from sklearn.metrics import roc_auc_score

● fpr, tpr, thresholds = roc_curve(y_true, y_scores)

● auc = roc_auc_score(y_true, y_scores)

31
Performance Matrices

Confusion
Matrix

Recall Accuracy

Classification
Classification
Model

Precision ROC-AUC

F1-SCore
32

Franzfeld S
50% (2)
Franzfeld S
46 pages
The Efficiency of Citrus Peel Essential Oil and Aloe Vera Extract As An Ant-Bacterial Hand Sanitizer
77% (13)
The Efficiency of Citrus Peel Essential Oil and Aloe Vera Extract As An Ant-Bacterial Hand Sanitizer
42 pages
Homework Suggestions From Chapter 6
100% (1)
Homework Suggestions From Chapter 6
2 pages
The Normal Curve
No ratings yet
The Normal Curve
85 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
INT524 unit3
No ratings yet
INT524 unit3
35 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
BSC ML CH1.pptx
No ratings yet
BSC ML CH1.pptx
63 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Classification
No ratings yet
Classification
4 pages
7 Types of Classification Algorithms
No ratings yet
7 Types of Classification Algorithms
9 pages
Unit 2
No ratings yet
Unit 2
28 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
5 markd
No ratings yet
5 markd
24 pages
datamining-lect12
No ratings yet
datamining-lect12
75 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
Assignment 2
No ratings yet
Assignment 2
111 pages
Mod09-ppt2-ML_in_Image_Classification
No ratings yet
Mod09-ppt2-ML_in_Image_Classification
30 pages
Machine Learning Classification Bootcamp Cheatsheet
No ratings yet
Machine Learning Classification Bootcamp Cheatsheet
7 pages
Machine Learning
100% (6)
Machine Learning
115 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
DM assignment 2
No ratings yet
DM assignment 2
23 pages
Machine_Learning_II
No ratings yet
Machine_Learning_II
61 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Week 11 - PROG 8510 Week 11
No ratings yet
Week 11 - PROG 8510 Week 11
16 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
CH-5_ML
No ratings yet
CH-5_ML
36 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
ML models
No ratings yet
ML models
21 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
UNIT 1,2,3
No ratings yet
UNIT 1,2,3
17 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
8 Classification
No ratings yet
8 Classification
45 pages
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
No ratings yet
Datamining-lect4 - Other Classification Techniques. Nearest Neighbor Classifiers, Support Vector Machines, Logistic Regression, Naive Bayes Classification. Supervised Learning
79 pages
UCS551 Chapter 6 - Classification
No ratings yet
UCS551 Chapter 6 - Classification
20 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
Classification FoundationalMathofAI S24
No ratings yet
Classification FoundationalMathofAI S24
6 pages
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
No ratings yet
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
55 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
IDS26 Clustering and Classification
No ratings yet
IDS26 Clustering and Classification
30 pages
19-Introduction classification algorithm-18-09-2024
No ratings yet
19-Introduction classification algorithm-18-09-2024
102 pages
Assignment 0.2
No ratings yet
Assignment 0.2
8 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
Learning
No ratings yet
Learning
51 pages
08 Classification
No ratings yet
08 Classification
46 pages
DM See M4
No ratings yet
DM See M4
8 pages
Assessing a Single Classification Algorithm and Two Classification Algorithms
No ratings yet
Assessing a Single Classification Algorithm and Two Classification Algorithms
12 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
ML.4-Classification Techniques (Week 5,6,7)
No ratings yet
ML.4-Classification Techniques (Week 5,6,7)
56 pages
ML Unit 4
No ratings yet
ML Unit 4
76 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Slide 10 Chapter9 Classification Advanced Methods
No ratings yet
Slide 10 Chapter9 Classification Advanced Methods
46 pages
Lecture 11_09.09.24 Classification Part 1
No ratings yet
Lecture 11_09.09.24 Classification Part 1
51 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Portfolio
100% (1)
Portfolio
23 pages
Forec
No ratings yet
Forec
6 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
10 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
Ethics in Research: Based On The Book "Responsible Conduct of Research"
No ratings yet
Ethics in Research: Based On The Book "Responsible Conduct of Research"
9 pages
Chi Square
No ratings yet
Chi Square
10 pages
Chapter 7 Statistics
No ratings yet
Chapter 7 Statistics
14 pages
Demand Forecasting
No ratings yet
Demand Forecasting
13 pages
Template Ijt
No ratings yet
Template Ijt
6 pages
Statistical Analysis 8: Two-Way Analysis of Variance (ANOVA)
No ratings yet
Statistical Analysis 8: Two-Way Analysis of Variance (ANOVA)
4 pages
Bias Format Updated
No ratings yet
Bias Format Updated
12 pages
Lemma Kifile DR Gebre Sorsa: ID NO:-108/2011
100% (6)
Lemma Kifile DR Gebre Sorsa: ID NO:-108/2011
22 pages
Sample Size Calculation & Software
No ratings yet
Sample Size Calculation & Software
26 pages
Kumar Et Al - Paper-2021
No ratings yet
Kumar Et Al - Paper-2021
20 pages
The Relationship Between Price and Loyalty in Serv
No ratings yet
The Relationship Between Price and Loyalty in Serv
10 pages
Assignment #1 Template - Descriptive Statistics Data Analysis Plan
No ratings yet
Assignment #1 Template - Descriptive Statistics Data Analysis Plan
3 pages
17-Matrix Sketching
No ratings yet
17-Matrix Sketching
65 pages
Unstructured Data Classification
No ratings yet
Unstructured Data Classification
2 pages
Rohana Mahbub Thesis
100% (2)
Rohana Mahbub Thesis
303 pages
Forecasting Examples Forecasting Example 1996 UG Exam: Month 1 2 3 4 5 Demand ('00s) 13 17 19 23 24
50% (2)
Forecasting Examples Forecasting Example 1996 UG Exam: Month 1 2 3 4 5 Demand ('00s) 13 17 19 23 24
14 pages
A Semi-Detailed Lesson Plan in Statistics and Probability
No ratings yet
A Semi-Detailed Lesson Plan in Statistics and Probability
2 pages
AI-GEOSTATS - The Central Information Server For Geostatistics and Spatial ST
No ratings yet
AI-GEOSTATS - The Central Information Server For Geostatistics and Spatial ST
13 pages
02 ABE Review - Sampling Techniques
No ratings yet
02 ABE Review - Sampling Techniques
41 pages
Forensic Science International - Volume 164 PDF
No ratings yet
Forensic Science International - Volume 164 PDF
199 pages

Lec 17 -Dsfa23

Uploaded by

Lec 17 -Dsfa23

Uploaded by

LECTURE 17

Data Science, Fall 2023 @ Knowledge Stream

• K nearest neighbors is a simple algorithm that stores all available

• A hierarchical, tree structure, consists of a root node, branches,

• SVM is used to classify data by finding the optimal decision

● We have linear separable classes.

Q: Which one is the best decision boundary?

● Idea: Choose a fat separator

● from sklearn.svm import SVC

You can choose different kernels (e.g., 'linear', 'poly', 'rbf') by

● The Naive Bayes classifier is a family of probabilistic machine

Given Outlook, Temperature,

Predict for sunny outlook,

Play = No is more likely!

● For this point (0.2, 0.8)

● from sklearn.metrics import roc_curve

● from sklearn.metrics import roc_auc_score

● fpr, tpr, thresholds = roc_curve(y_true, y_scores)

● auc = roc_auc_score(y_true, y_scores)

You might also like