Evaluation

Uploaded by

ageorgesaji5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Evaluation

Uploaded by

ageorgesaji5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Evaluation

• Evaluation is the process of understanding the reliability of any AI

model, based on outputs by feeding test dataset into the model and
comparing with actual answers. There can be different Evaluation
techniques, depending of the type and purpose of the model.
Remember that It’s not recommended to use the data we used to
build the model to evaluate it. This is because our model will simply
remember the whole training set, and will therefore always predict
the correct label for any point in the training set. This is known as
overfitting.
• Model Evaluation Terminologies
• There are various new terminologies which come into the picture when we work
on evaluating our model. Let’s explore them with an example of the Forest fire
scenario.
• The Scenario
• Imagine that you have come up with an AI based prediction model which has
been deployed in a forest which is prone to forest fires. Now, the objective of the
model is to predict whether a forest fire has broken out in the forest or not. Now,
to understand the efficiency of this model, we need to check if the predictions
which it makes are correct or not. Thus, there exist two conditions which we need
to ponder upon: Prediction and Reality. The prediction is the output which is
given by the machine and the reality is the real scenario in the forest when the
prediction has been made. Now let us look at various combinations that we can
have with these two conditions.
• Here, we can see in the picture that a forest fire has broken out in the
forest. The model predicts a Yes which means there is a forest fire.
The Prediction matches with the Reality. Hence, this condition is
termed as True Positive. Here there is no fire in the forest hence the
reality is No. In this case, the machine too has predicted it correctly as
a No. Therefore, this condition is termed as True Negative
• Here the reality is that there is no forest fire. But the machine has
incorrectly predicted that there is a forest fire. This case is termed as
False Positive
• Here, a forest fire has broken out in the forest because of which the
Reality is Yes but the machine has incorrectly predicted it as a No
which means the machine predicts that there is no Forest Fire.
Therefore, this case becomes False Negative.
Confusion matrix

• The result of comparison between the prediction and reality can be

recorded in what we call the confusion matrix. The confusion matrix
allows us to understand the prediction results. Note that it is not an
evaluation metric but a record which can help in evaluation. Let us
once again take a look at the four conditions that we went through in
the Forest Fire example:
Prediction and Reality can be easily mapped
together with the help of this confusion matrix.
Evaluation Methods

• Now as we have gone through all the possible combinations of

Prediction and Reality, let us see how we can use these conditions to
evaluate the model.
Accuracy

• Accuracy is defined as the percentage of correct predictions out of all

the observations. A prediction can be said to be correct if it matches
the reality. Here, we have two conditions in which the Prediction
matches with the Reality: True Positive and True Negative. Hence, the
formula for Accuracy becomes:
• Here, total observations cover all the possible cases of prediction that can be True
Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN).
• As we can see, Accuracy talks about how true the predictions are by any model.
Let us ponder:
• Is high accuracy equivalent to good performance?
• How much percentage of accuracy is reasonable to show good performance
• Let us go back to the Forest Fire example. Assume that the model always predicts
that there is no fire. But in reality, there is a 2% chance of forest fire breaking out.
In this case, for 98 cases, the model will be right but for those 2 cases in which
there was a forest fire, then too the model predicted no fire.
• Here,
• True Positives = 0
• True Negatives = 98
• Total cases = 100
• Therefore, accuracy becomes: (98 + 0) / 100 = 98%
• This is a fairly high accuracy for an AI model. But this parameter is
useless for us as the actual cases where the fire broke out are not
taken into account. Hence, there is a need to look at another
parameter which takes account of such cases as well.
Precision
• Precision is defined as the percentage of true positive cases versus all
the cases where the prediction is true. That is, it takes into account
the True Positives and False Positives.
• Going back to the Forest Fire example, in this case, assume that the model
always predicts that there is a forest fire irrespective of the reality. In this
case, all the Positive conditions would be taken into account that is, True
Positive (Prediction = Yes and Reality = Yes) and False Positive (Prediction =
Yes and Reality = No). In this case, the firefighters will check for the fire all
the time to see if the alarm was True or False.
• You might recall the story of the boy who falsely cries out that there are
wolves every time and so when they actually arrive, no one comes to his
rescue. Similarly, here if the Precision is low (which means there are more
False alarms than the actual ones) then the firefighters would get
complacent and might not go and check every time considering it could be
a false alarm.
• This makes Precision an important evaluation criteria. If Precision is
high, this means the True Positive cases are more, giving lesser False
alarms.
• But again, is good Precision equivalent to a good model performance?
Why?
• Let us consider that a model has 100% precision. Which means that
whenever the machine says there’s a fire, there is actually a fire (True
Positive). In the same model, there can be a rare exceptional case
where there was actual fire but the system could not detect it. This is
the case of a False Negative condition. But the precision value would
not be affected by it because it does not take FN into account. Is
precision then a good parameter for model performance?
Recall

• Another parameter for evaluating the model’s performance is Recall.

It can be defined as the fraction of positive cases that are correctly
identified. It majorly takes into account the true reality cases where in
Reality there was a fire but the machine either detected it correctly or
it didn’t. That is, it considers True Positives (There was a forest fire in
reality and the model predicted a forest fire) and False Negatives
(There was a forest fire and the model didn’t predict it).
• Now as we notice, we can see that the Numerator in both Precision
and Recall is the same: True Positives. But in the denominator,
Precision counts the False Positives while Recall takes False Negatives
into consideration.
• Let us ponder… Which one do you think is better? Precision or Recall?
Why?
Which Metric is Important?

• Choosing between Precision and Recall depends on the condition in which the model has
been deployed. In a case like Forest Fire, a False Negative can cost us a lot and is risky
too. Imagine no alert being given even when there is a Forest Fire. The whole forest
might burn down.
• Another case where a False Negative can be dangerous is Viral Outbreak. Imagine a
deadly virus has started spreading and the model which is supposed to predict a viral
outbreak does not detect it. The virus might spread widely and infect a lot of people.
• On the other hand, there can be cases in which the False Positive condition costs us more
than False Negatives. One such case is Mining. Imagine a model telling you that there
exists treasure at a point and you keep on digging there but it turns out that it is a false
alarm. Here, False Positive case (predicting there is treasure but there is no treasure) can
be very costly.
• Similarly, let’s consider a model that predicts that a mail is spam or not. If the model
always predicts that the mail is spam, people would not look at it and eventually might
lose important information. Here also False Positive condition (Predicting the mail as
spam while the mail is not spam) would have a high cost.
• Think of some more examples having:
• • High False Negative cost

• ________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________
• • High False Positive cost
• To conclude the argument, we must say that if we want to know if our
model’s performance is good, we need these two measures: Recall
and Precision. For some cases, you might have a High Precision but
Low Recall or Low Precision but High Recall. But since both the
measures are important, there is a need of a parameter which takes
both Precision and Recall into account.
F1 Score

• F1 score can be defined as the measure of balance between precision

and recall.
• Take a look at the formula and think of when can we get a perfect F1
score?
• An ideal situation would be when we have a value of 1 (that is 100%)
for both Precision and Recall. In that case, the F1 score would also be
an ideal 1 (100%). It is known as the perfect value for F1 Score. As the
values of both Precision and Recall ranges from 0 to 1, the F1 score
also ranges from 0 to 1.
• Scenario 1:
• In schools, a lot of times it happens that there is no water to drink. At
a few places, cases of water shortage in schools are very common and
prominent. Hence, an AI model is designed to predict if there is going
to be a water shortage in the school in the near future or not. The
confusion matrix for the same is:
Scenario 2:

• Nowadays, the problem of floods has worsened in some parts of the

country. Not only does it damage the whole place but it also forces
people to move out of their homes and relocate. To address this
issue, an AI model has been created which can predict if there is a
chance of floods or not. The confusion matrix for the same is:
Scenario 3:

• A lot of times people face the problem of sudden downpour. People

wash clothes and put them out to dry but due to unexpected rain,
their work gets wasted. Thus, an AI model has been created which
predicts if there will be rain or not. The confusion matrix for the same
is:
Scenario 4:

• Traffic Jams have become a common part of our lives nowadays.

Living in an urban area means you have to face traffic each and every
time you get out on the road. Mostly, school students opt for buses to
go to school. Many times the bus gets late due to such jams and
students are not able to reach their school on time. Thus, an AI model
is created to predict explicitly if there would be a traffic jam on their
way to school or not. The confusion matrix for the same is:

G 5 MSCHEMES TERM 1
No ratings yet
G 5 MSCHEMES TERM 1
27 pages
CE 2251-Soil Mechanics Solved Anna University Question Bank
No ratings yet
CE 2251-Soil Mechanics Solved Anna University Question Bank
24 pages
SAP ATP Calculation
100% (1)
SAP ATP Calculation
6 pages
EvaluationNotes
No ratings yet
EvaluationNotes
12 pages
Evaluation Grade10 Ai
No ratings yet
Evaluation Grade10 Ai
32 pages
AI-Evaluation
No ratings yet
AI-Evaluation
30 pages
EVALUATION PPT
No ratings yet
EVALUATION PPT
25 pages
EVALUATION
No ratings yet
EVALUATION
12 pages
Evaluation - Grade 10 AI
No ratings yet
Evaluation - Grade 10 AI
12 pages
Evaluation 1
No ratings yet
Evaluation 1
23 pages
5.10AI -2B
No ratings yet
5.10AI -2B
15 pages
Evaluation AI X
No ratings yet
Evaluation AI X
6 pages
AI SS CH 7 LM
No ratings yet
AI SS CH 7 LM
39 pages
04 Evaluation Revision Notes
No ratings yet
04 Evaluation Revision Notes
5 pages
Part B Chapter 7 (Evaluation)
No ratings yet
Part B Chapter 7 (Evaluation)
5 pages
c10 Ai Evaluation -2024-25
No ratings yet
c10 Ai Evaluation -2024-25
29 pages
Ch 7 - notes evaluation
No ratings yet
Ch 7 - notes evaluation
3 pages
Ch-EVALUATION
No ratings yet
Ch-EVALUATION
7 pages
Evaluation in AI
No ratings yet
Evaluation in AI
20 pages
AI Evaluation
No ratings yet
AI Evaluation
24 pages
Grade 10 Unit 7 - Evaluation
No ratings yet
Grade 10 Unit 7 - Evaluation
50 pages
Evaluation__1646538719041
No ratings yet
Evaluation__1646538719041
65 pages
EVALUATION - notes
No ratings yet
EVALUATION - notes
15 pages
EVALUATION
No ratings yet
EVALUATION
10 pages
Evaluation 2
No ratings yet
Evaluation 2
15 pages
X Unit 7 Evaluation
No ratings yet
X Unit 7 Evaluation
5 pages
Evaluation Worksheet
No ratings yet
Evaluation Worksheet
2 pages
Unit 7 - AI (Evaluation)
No ratings yet
Unit 7 - AI (Evaluation)
28 pages
Evaluation Class X
50% (2)
Evaluation Class X
19 pages
MS EVALUATION WORKSHEET
No ratings yet
MS EVALUATION WORKSHEET
3 pages
A Field of Computer Science That Focuses On Enabling Computers To Identify and Understand Objects and People in Images and Videos
No ratings yet
A Field of Computer Science That Focuses On Enabling Computers To Identify and Understand Objects and People in Images and Videos
136 pages
AI Project Evaluation 1
No ratings yet
AI Project Evaluation 1
5 pages
Ch 07 Evaluation
No ratings yet
Ch 07 Evaluation
25 pages
EVALUATION
No ratings yet
EVALUATION
4 pages
Unit 7 - Evaluation
No ratings yet
Unit 7 - Evaluation
7 pages
517-c-30072-Assignment Chapter Evaluation
No ratings yet
517-c-30072-Assignment Chapter Evaluation
10 pages
Evaluation Notes
No ratings yet
Evaluation Notes
12 pages
Evaluation 1 7
No ratings yet
Evaluation 1 7
7 pages
Evaluation Question Answers
No ratings yet
Evaluation Question Answers
7 pages
AI Evaluation
No ratings yet
AI Evaluation
3 pages
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
No ratings yet
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
7 pages
UNIT 7 EVALUATION.docx
No ratings yet
UNIT 7 EVALUATION.docx
13 pages
1051637-Worksheet Part b Unit7 Evaluation
No ratings yet
1051637-Worksheet Part b Unit7 Evaluation
5 pages
Evaluation-Important Questions
No ratings yet
Evaluation-Important Questions
12 pages
417_AI_Handbook_Class9_Evaluation
No ratings yet
417_AI_Handbook_Class9_Evaluation
5 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
EvaluationQuestions Class 10 Ai
No ratings yet
EvaluationQuestions Class 10 Ai
6 pages
Evaluation notes
No ratings yet
Evaluation notes
12 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Q ClassX AI Evaluation
No ratings yet
Q ClassX AI Evaluation
12 pages
Evaluation
No ratings yet
Evaluation
12 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
10 Ai Evaluation tp01
No ratings yet
10 Ai Evaluation tp01
5 pages
Part B Unit 7 Evaluation
No ratings yet
Part B Unit 7 Evaluation
11 pages
Evaluation Exercise
No ratings yet
Evaluation Exercise
3 pages
21-General approach to classification, classification by decision tree induction-17-02-2025
No ratings yet
21-General approach to classification, classification by decision tree induction-17-02-2025
15 pages
performance evaluation
No ratings yet
performance evaluation
24 pages
UNIT 3 Evaluating Models Q-Ans
No ratings yet
UNIT 3 Evaluating Models Q-Ans
6 pages
UNIT-3
No ratings yet
UNIT-3
13 pages
2.Confusion matrix and Performmance Metrics
No ratings yet
2.Confusion matrix and Performmance Metrics
15 pages
Aiunit 7 10
No ratings yet
Aiunit 7 10
4 pages
Evaluation Metrics For Machine Learning: Negative (Actual) 98 Positive (Actual) 1
No ratings yet
Evaluation Metrics For Machine Learning: Negative (Actual) 98 Positive (Actual) 1
2 pages
Errors of Regression Models: Bite-Size Machine Learning, #1
From Everand
Errors of Regression Models: Bite-Size Machine Learning, #1
Lee Baker
No ratings yet
Complex Analysis Handout 7 Path Integrals
No ratings yet
Complex Analysis Handout 7 Path Integrals
3 pages
Class VII
No ratings yet
Class VII
13 pages
Assinment On Fuzzy Boundaries
100% (2)
Assinment On Fuzzy Boundaries
5 pages
Design and Optimization OF A Vertical Turbine Pump - Empowering Pumps and Equipment
100% (1)
Design and Optimization OF A Vertical Turbine Pump - Empowering Pumps and Equipment
12 pages
Supply Chain in FMCG Sector
No ratings yet
Supply Chain in FMCG Sector
12 pages
BQQ6214 Statistical Formulae
No ratings yet
BQQ6214 Statistical Formulae
3 pages
NITK Unit 3 Lecture 19 Curve Fitting
No ratings yet
NITK Unit 3 Lecture 19 Curve Fitting
12 pages
Rangkuman Bab 6 Dan 7 Return and Risk, Portfolio Theory Is Unversal
No ratings yet
Rangkuman Bab 6 Dan 7 Return and Risk, Portfolio Theory Is Unversal
5 pages
HW3
No ratings yet
HW3
2 pages
Plateau's Problem - Wikipedia
No ratings yet
Plateau's Problem - Wikipedia
14 pages
12-StereoGeometry L12
No ratings yet
12-StereoGeometry L12
68 pages
MY Notes 1st Year Math Mcqs With Answers
No ratings yet
MY Notes 1st Year Math Mcqs With Answers
24 pages
2021 Computational Fluid-Dynamics Modelling of Supersonic Ejectors
No ratings yet
2021 Computational Fluid-Dynamics Modelling of Supersonic Ejectors
58 pages
1-sw17 Spinner Tri Sprinner
No ratings yet
1-sw17 Spinner Tri Sprinner
7 pages
LECTURE 4 Boyles Law
No ratings yet
LECTURE 4 Boyles Law
1 page
Parallel Lines in The City
No ratings yet
Parallel Lines in The City
3 pages
Technical Information: Tonnage Calculation
100% (1)
Technical Information: Tonnage Calculation
2 pages
Self-Concept Self-Efficacy and Academic Achievemen
No ratings yet
Self-Concept Self-Efficacy and Academic Achievemen
7 pages
Office Equipment, Inc.: Waiting Line Model With A Finite Calling Population M/M/1
No ratings yet
Office Equipment, Inc.: Waiting Line Model With A Finite Calling Population M/M/1
9 pages
Form 3 - Chapter 11
No ratings yet
Form 3 - Chapter 11
7 pages
Digital Systems of PPT Lecture 1
No ratings yet
Digital Systems of PPT Lecture 1
59 pages
Logarithms by Sir Zivanai
No ratings yet
Logarithms by Sir Zivanai
15 pages
(Texts and Readings in Mathematics 55) Sebastian M. Cioabă, M. Ram Murty (Auth.) - A First Course in Graph Theory and Combinatorics-Hindustan Book Agency (2009)
No ratings yet
(Texts and Readings in Mathematics 55) Sebastian M. Cioabă, M. Ram Murty (Auth.) - A First Course in Graph Theory and Combinatorics-Hindustan Book Agency (2009)
185 pages
5 Sector Area and Arc Length
No ratings yet
5 Sector Area and Arc Length
8 pages
Gas-Liquids Separators - Mark Bothamley - Part 2
No ratings yet
Gas-Liquids Separators - Mark Bothamley - Part 2
14 pages
Hamming Code
No ratings yet
Hamming Code
2 pages
Graphing Rational Functions PDF
No ratings yet
Graphing Rational Functions PDF
4 pages

Evaluation

Uploaded by

Evaluation

Uploaded by

Evaluation

• Evaluation is the process of understanding the reliability of any AI

• The result of comparison between the prediction and reality can be

• Now as we have gone through all the possible combinations of

• Accuracy is defined as the percentage of correct predictions out of all

• Another parameter for evaluating the model’s performance is Recall.

• F1 score can be defined as the measure of balance between precision

• Nowadays, the problem of floods has worsened in some parts of the

• A lot of times people face the problem of sudden downpour. People

• Traffic Jams have become a common part of our lives nowadays.

You might also like