Lec 17 -Dsfa23
Lec 17 -Dsfa23
Classification
Building models of classification in sklearn
1
• Introduction to Classification
• Types of Classification
• Classification Algorithms
• Performance Metrics
• Overfitting and Underfitting
• Conclusion
Outline
Lecture 16
2
Classification
● Classification is defined as the process of recognition and grouping of objects
● Classification refers to a predictive modeling problem where a class label is
predicted for a given example of input data
● For Classification, the training dataset must be sufficiently representative of
the problem and have many examples of each class label.
1. Logistic Regression
2. k-Nearest Neighbors
3. Decision Trees
4. Support Vector Machine
5. Naive Bayes
3
Logistic Regression
4
Logistic Regression
5
K-Nearest Neighbor
6
Decision Trees (DTs)
7
Support Vector Machine (SVM)
8
Support Vector Machine (SVM)
10
Support Vector Machine (SVM)
11
Naive Bayes
• The Naïve Bayes classifier is used for classification tasks, like text
classification.
• Algorithms for binary classification
1. Logistic Regression
2. k-Nearest Neighbors
3. Decision Trees
4. Support Vector Machine
5. Naive Bayes
12
Naive Bayes
13
Naive Bayes
Mathematically
14
Naive Bayes
15
16
Naive Bayes
Confusion
Matrix
Classification
18
Performance Matrices
Confusion
Matrix
Recall
Classification
Classification
Model
19
Performance Matrices
Confusion
Matrix
Recall
Classification
Classification
Model
Precision
20
Performance Matrices
Confusion
Matrix
Recall
Classification
Classification
Model
Precision
F1-SCore
21
Performance Matrices
Confusion
Matrix
Recall
Classification
Classification
Model ROC curve is a graphical
representation of the
performance of the binary
Precision ROC-AUC classifier
F1-SCore
ROC Curve
● We get the data of 10 people about how high blood level is and whether
there is disease or not.
23
ROC Curve
● 5 individuals have the disease and 4 are classified correctly and 1 is
misclassified.
● The true positive rate (TPR) is 0.8
24
ROC Curve
● 5 individuals are healthy and 3 are classified correctly and 2 are
misclassified.
● The false positive rate (FPR) is 0.4
25
ROC Curve
● We calculate the value of TPR and FPR for different value of thresholds.
26
ROC Curve
● The true positive rate (TPR) is 1
● The false positive rate (FPR) is 1
27
ROC Curve
● The true positive rate (TPR) is 1
● The false positive rate (FPR) is 0.8
28
ROC Curve
● The true positive rate (TPR) is 0
● The false positive rate (FPR) is 0
29
ROC Curve
30
ROC Curve
31
Performance Matrices
Confusion
Matrix
Recall Accuracy
Classification
Classification
Model
Precision ROC-AUC
F1-SCore
32