Cbse - Department of Skill Education Artificial Intelligence
Cbse - Department of Skill Education Artificial Intelligence
ARTIFICIAL INTELLIGENCE
QUESTION BANK – CLASS 10
CHAPTER 8: EVALUATION
One (01) Mark Questions
1. Define Evaluation.
Moving towards deploying the model in the real world, we test it in as many ways
aspossible. The stage of testing the models is known as EVALUATION.
OR
Evaluation is a process of understanding the reliability of any AI model, based on outputs
by feeding the test dataset into the model and comparing it with actual answers.
OR
Evaluation is a process that critically examines a program. It involves collecting and
analyzing information about a program’s activities, characteristics, and outcomes. Its
purpose is to make judgments about a program, to improve its effectiveness, and/or to
inform programming decisions.
2. Which two parameters are considered for Evaluation of a model?
Prediction and Reality are the two parameters considered for Evaluation of a model.
The “Prediction” is the output which is given by the machine and the “Reality”is the real
scenario, when the prediction has been made?
Precision is defined as the percentage of true positive cases versus all the cases where the
prediction is true.
That is, it takes into account the True Positives and False Positives.
Evaluation is important to ensure that the model is operating correctly and optimally.
Evaluation is an initiative to understand how well it achieves its goals.
Evaluations help to determine what works well and what could be improved in a
program
6. How do you suggest which evaluation metric is more important for any case?
F 1 Evaluation metric is more important in any case. F1 score sort maintains a balance
between the precision and recall for the classifier. If the precision is low, the F1 is low
and if the recall is low again F1 score is low.
The F1 score is a number between 0 and 1 and is the harmonic mean of precision
and recall
When we have a value of 1 (that is 100%) for both Precision and Recall. The F1 score would
also be an ideal 1 (100%). It is known as the perfect value for F1 Score. As the values of
both Precision and Recall ranges from 0 to 1, the F1 score also ranges from 0 to 1.
7. Which evaluation metric would be crucial in the following cases? Justify your
answer.
a. Mail Spamming
b. Gold Mining
c. Viral Outbreak
Here, Mail Spamming and Gold Mining are related to FALSE POSITIVE cases which are
expensive at cost. But Viral Outbreak is a FALSE NEGATIVE case which infects a lot of
people on health and leads to expenditure of money too for checkups.
So, False Negative case (VIRAL OUTBREAK) are more crucial and dangerous when
compared to FALSE POSITIVE cases.
(OR)
CBSE Question Bank – AI – Class 10 – Chapter- 8 Evaluation 3
a. If the model always predicts that the mail is spam, people would not look at it and
eventually might lose important information. False Positive condition would have a
high cost. (predicting the mail as spam while the mail is not spam)
b. A model saying that there exists treasure at a point and you keep on digging there but
it turns out that it is a false alarm. False Positive case is very costly.
(predicting there is a treasure but there is no treasure)
c. A deadly virus has started spreading and the model which is supposed to predict a
viral outbreak does not detect it. The virus might spread widely and infect a lot of
people. Hence, False Negative can be dangerous
8. What are the possible reasons for an AI model not being efficient? Explain.
Reasons of an AI model not being efficient:
a. Lack of Training Data: If the data is not sufficient for developing an AI Model, or if the
data is missed while training the model, it will not be efficient.
b. Unauthenticated Data / Wrong Data: If the data is not authenticated and correct, then
the model will not give good results.
c. Inefficient coding / Wrong Algorithms: If the written algorithms are not correct
and relevant, Model will not give desired output. Not Tested: If the model is not
tested properly, then it will not be efficient.
d. Not Easy: If it is not easy to be implemented in production or scalable.
e. Less Accuracy: A model is not efficient if it gives less accuracy scores in production
or test data or if it is not able to generalize well on unseen data.
(Any three of the above can be selected)
9. Answer the following:
The F1 Score, also called the F score or F measure, is a measure of a test’s accuracy.
It is calculated from the precision and recall of the test, where the precision is the number
of correctly identified positive results divided by the number of all positive results,
including those not identified correctly, and the recall is the number of correctly identified
positive results divided by the number of all samples that should have been identified as
positive.
The F1 score is defined as the weighted harmonic mean of the test’s precision and recall.
This score is calculated according to the formula.
Formula:
Necessary:
F-Measure provides a single score that balances both the concerns of precision and recall in
one number.
A good F1 score means that you have low false positives and low false negatives, so you’re
correctly identifying real threats, and you are not disturbed by false alarms.
An F1 score is considered perfect when it’s 1, while the model is a total failure when it’s 0.
F1 Score is a better metric to evaluate our model on real-life classification problems and
when imbalanced class distribution exists.
Confusion Matrix:
A Confusion Matrix is a table that is often used to describe the performance of a
classification model (or "classifier") on a set of test data for which the true values are
known.
(or)
A 2x2 matrix denoting the right and wrong predictions might help us analyse the rate of
success. This matrix is termed the Confusion Matrix.
Therefore, Confusion Matrix provides a more insightful picture which is not only the
performance of a predictive model, but also which classes are being predicted correctly and
incorrectly, and what type of errors are being made.
The confusion matrix is useful for measuring Recall (also known as Sensitivity), Precision,
Accuracy and F1 Score.
The actual value was negative and the model predicted a negative value
False Positive (FP) – Type 1 error
The predicted value was falsely predicted
The actual value was negative but the model predicted a positive value ● Also known as the
Type 1 error
False Negative (FN) – Type 2 error
The actual value was positive but the model predicted a negative value also known as the
Type 2 error
Example:
The result of TN will be that good loans are correctly predicted as good loans.
The result of FP will be that (actual) good loans are incorrectly predicted as bad loans.
The result of FN will be that (actual) bad loans are incorrectly predicted as good loans.
The banks would lose a bunch of money if the actual bad loans are predicted as good loans
due to loans not being repaid. On the other hand, banks won't be able to make more
revenue if the actual good loans are predicted as bad loans. Therefore, the cost of False
Negatives is much higher than the cost of False Positives.
3. Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion Matrix
on Heart Attack Risk. Also suggest which metric would not be a good evaluation
parameter here and why?
= (70/100)
= 0.7
Precision:
Precision is defined as the percentage of true positive cases versus all the cases where the
prediction is true.
= 50 / (50 + 60)
= 50 / 110
= 0.5
F1 Score:
Therefore,
Accuracy= 0.7 Precision=0.714 Recall=0.5
F1 Score=0.588
Here within the test there is a tradeoff. But Recall is not a good Evaluation metric. Recall
metric needs to improve more.
Because,
False Positive (impacts Precision): A person is predicted as high risk but does not have
heart attack.
False Negative (impacts Recall): A person is predicted as low risk but has heart attack.
Therefore, False Negatives miss actual heart patients, hence recall metric need more
improvement.
False Negatives are more dangerous than False Positives.
4. Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion Matrix on
Water Shortage in Schools: Also suggest which metric would not be a good evaluation
parameter here and why?
Reality: 1 Reality: 0
Prediction: 1 75 5 80
Prediction: 0 5 15 20
80 20 100
Calculation:
Accuracy
Accuracy is defined as the percentage of correct predictions out of all the observations
Where True Positive (TP), True Negative (TN), False Positive (FP) and False Negative
(FN).
CBSE Question Bank – AI – Class 10 – Chapter- 8 Evaluation 9
= (75+15) / (75+15+5+5)
= (90 / 100)
=0.9
Precision:
Precision is defined as the percentage of true positive cases versus all the cases where the
prediction is true.
= 75 / (75+5)
= 75 /80
= 0.9375
Recall:
= 75 / (75+5)
= 75 /80
= 0.9375
F1 Score:
F1 score is defined as the measure of balance between precision and recall.
= 2 * ((0.9375 *0.9375) / (0.9375+0.9375)
Therefore,
= 2 * (0.8789 / 1.875)
= 2 * 0.46875
= 0.9375
Accuracy= 0.9% Precision=0.9375% Recall=0.9375%
F1 Score=0.
Accuracy is defined as the percentage of correct predictions out of all the observations
Where True Positive (TP), True Negative (TN), False Positive (FP) and False Negative
(FN).
= (10 + 25) / (10+25+55+10)
= 35 / 100
= 0.35
Precision:
Precision is defined as the percentage of true positive cases versus all the cases where
the prediction is true.
= 10 / (10 +55)
= 10 /65
= 0.15
Recall:
It is defined as the fraction of positive cases that are correctly identified.
Therefore,
= 2 * ((0.15 * 0.5) / (0.15 + 0.5))
= 2 * (0.075 / 0.65)
= 2 * 0.115
= 0.23
Accuracy= 0.35
Precision= 0.15
Recall= 0.5
F1 Score= 0.23
Here within the test there is a tradeoff. But Precision is not a good Evaluation metric.
Precision metric needs to improve more.
Because,
False Negative (impacts Recall): Mail is predicted as “not spam” but spam
Of course, too many False Negatives will make the Spam Filter ineffective. But False
Positives may cause important mails to be missed. Hence, Precision is more important
to improve