0% found this document useful (0 votes)

2 views

Machine Learning

The document discusses key concepts in machine learning, including when to use classification versus regression, how ROC curves work, and the role of support vectors in SVMs. It also covers encoding techniques like One-hot and Label Encoding, ensemble methods like bagging and boosting, and the importance of cross-validation. Additionally, it explains outlier detection methods, the significance of the P-value, and the assumptions of linear regression.

Uploaded by

aryamuna20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Machine Learning

Uploaded by

aryamuna20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1. When should you use classification over regression?

Classification and regression are two main types of supervised learning in machine
learning, used for different types of prediction problems based on the nature of the target
variable.
1. Nature of the Target Variable: Classification is used when the target variable is
categorical or discrete. The goal is to predict the class or category to which a data point
belongs. For example, determining whether an email is spam or not, or classifying
images into labels like “cat,” “dog,” or “bird.

“Regression is used when the target variable is continuous or numeric, and the goal is
to predict a real-valued number, such as predicting house prices or temperatures.

2. Output Type: Classification outputs discrete labels. For instance, a classifier might
output labels such as “positive” or “negative,” or multiple classes like digits 0–9.
Regression outputs a continuous numerical value, such as 120,000 (house price) or 23.5
(temperature).
3. Problem Examples: Use classification for problems like medical diagnosis (disease vs.
no disease), sentiment analysis (positive/negative), or fraud detection (fraudulent/not
fraudulent). Use regression for predicting quantities like stock prices, sales forecasting,
or predicting fuel consumption.
4. 4. Evaluation Metrics: Classification models are evaluated using metrics such as
accuracy, precision, recall, F1-score, and ROC-AUC. Regression models use metrics
like Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared.

You should use classification when your problem requires assigning data points to distinct
categories, and use regression when you want to predict a continuous numeric value. Choosing
the correct approach depends primarily on the nature of the output you need from your model.

2. Explain how a ROC curve works.?

A Receiver Operating Characteristic (ROC) curve is a graphical representation that visualizes
the performance of a binary classification model across different thresholds. It plots the True
Positive Rate (TPR) against the False Positive Rate (FPR). The curve essentially shows how
well a model can distinguish between positive and negative classes at varying levels of
confidence.
How it Works:
o Model Predictions: The classification model generates predictions, which are
typically scores or probabilities indicating the likelihood of a positive outcome.
o Adjusting Thresholds: By varying the classification threshold, the model's
predictions are altered.
o Calculating TPR and FPR: For each threshold, the TPR and FPR are calculated
based on the model's predictions and the actual labels.
o Plotting the Curve: The (FPR, TPR) pairs are plotted, creating the ROC curve.

The ROC curve shows the trade-off between sensitivity (or TPR) and specificity (1 – FPR).
Classifiers that give curves closer to the top-left corner indicate a better performance. As a baseline, a
random classifier is expected to give points lying along the diagonal (FPR = TPR). The closer the
curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.

Note that the ROC does not depend on the class distribution. This makes it useful for evaluating
classifiers predicting rare events such as diseases or disasters. In contrast, evaluating performance
using accuracy (TP +TN)/(TP + TN + FN + FP) would favor classifiers that always predict a negative
outcome for rare events.

3. What are Support Vectors in SVMs?

Support Vector Machine (SVM)
Support Vector Machine (SVM) is a popular supervised machine learning algorithm mainly
used for classification problems. It works by finding the best boundary (called a hyperplane)
that separates data points of different classes. The goal of SVM is to maximize the margin,
which is the distance between the hyperplane and the closest data points from each class.
These closest points are called support vectors, and they are important because the position
of the hyperplane depends on them.
SVM is very effective when the data is clear and separated by a margin. However, many real-
world datasets are not linearly separable—which means you cannot draw a straight line to
separate classes. To solve this, SVM uses something called kernels.
Kernels in SVM
A kernel function helps SVM by transforming the data into a higher-dimensional space
where it is easier to find a separating hyperplane. Instead of working with the original data
directly, kernels allow SVM to learn complex boundaries.
Some common kernels used in SVM are:
• Linear Kernel:
This is the simplest kernel. It is used when data can be separated by a straight line or
hyperplane in the original space. It does not change the data.
• Polynomial Kernel:
This kernel transforms the data into a polynomial feature space. It can create curved
boundaries. The degree of the polynomial decides how complex the curve is.
• Radial Basis Function (RBF) Kernel (also called Gaussian Kernel):
This kernel maps data into an infinite-dimensional space. It works well when the data is not
linearly separable and has complex shapes or clusters.
• Sigmoid Kernel:
This kernel behaves like the activation function used in neural networks. It can model some
non-linear problems but is less common than RBF or polynomial kernels.
• SVM finds the best boundary with the largest margin between classes.
• Support vectors are the key points closest to the boundary.
• Kernels help SVM handle non-linear data by transforming it into higher dimensions.
• Common kernels are Linear, Polynomial, RBF, and Sigmoid.

10 . Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of the given
dataset?

In machine learning and data preprocessing, categorical data must be converted into numerical form to
be used effectively by models. Two common encoding techniques are Label Encoding and One-Hot
Encoding.

Label Encoding assigns a unique integer to each category of a feature. For example, a "Color" feature
with categories "Red," "Blue," and "Green" can be encoded as 0, 1, and 2 respectively. This method
replaces the original categorical column with a single integer column, so the dimensionality of the
dataset remains unchanged. Label Encoding is simple and memory-efficient but introduces an implicit
ordinal relationship among categories that may not exist. This can mislead algorithms, particularly
linear models, into assuming a hierarchy between categories.

On the other hand, One-Hot Encoding converts each category into a separate binary column. For the
same "Color" example, this results in three new columns: Is_Red, Is_Blue, and Is_Green, each
containing 1 if the row corresponds to that category and 0 otherwise. This increases the dimensionality
of the dataset, as one categorical column with k unique values becomes k binary columns. One-Hot
Encoding preserves the nominal nature of categories and does not impose any order, making it suitable
for models sensitive to numerical relationships or those assuming feature independence. However, it
can lead to high-dimensional and sparse datasets.

Label Encoding is dimensionally efficient but can misrepresent category relationships, while One-Hot
Encoding preserves category independence at the cost of increased dimensionality. The choice depends
on the nature of the data and the machine learning algorithm used. Proper encoding improves model
accuracy and interpretability.

11 . What is bagging and boosting in Machine Learning?

Bagging and Boosting are ensemble learning techniques used to improve model performance.

• Bagging (Bootstrap Aggregating): It trains multiple models independently on different

random samples of the data (with replacement) and combines their predictions by voting or
averaging. This reduces variance and helps prevent overfitting. Example: Random Forest.

• Boosting: It trains models sequentially, where each new model focuses on correcting errors
made by previous models by giving more weight to difficult cases. This reduces bias and
improves accuracy. Example: AdaBoost, Gradient Boosting.

12 . What is Cross-validation in Machine Learning?

Cross-validation is a statistical technique used in machine learning to evaluate and improve the
performance of a model. It involves splitting the dataset into multiple subsets, training the model on
some subsets, and testing it on the remaining ones. This helps in assessing how well the model
generalizes to unseen data.

Purpose of Cross-Validation:

• To check the model's performance on independent data.

• To avoid overfitting or underfitting.

• To make the most use of limited data by reusing it for both training and testing.

Types of Cross-Validation:
1. Hold-out Method:

o Split the data into two sets: training and testing.

o Simple but may not give consistent results if data is limited.

2. k-Fold Cross-Validation (Most common):

o The data is divided into k equal parts.

o The model is trained on k-1 parts and tested on the remaining one.

o The process is repeated k times, each time with a different test part.

o Final result = average performance from all folds.

3. Stratified k-Fold Cross-Validation

o Similar to k-Fold but maintains the class distribution in each fold.

o Useful for imbalanced classification problems.

4. Leave-One-Out Cross-Validation (LOOCV):

o Each data point is used once as a test set, while the rest are training data.

o Very thorough but computationally expensive.

5. Time Series Cross-Validation:

o For time-dependent data.

o Ensures that training data always comes before test data chronologically.

Advantages:

• Provides a more accurate estimate of model performance.

• Reduces the chances of bias from one random train-test split.

• Helps in model selection and tuning hyperparameters.

Disadvantages:

• Can be computationally intensive, especially for large datasets.

• Not suitable for all types of data (e.g., time series needs special handling).

Cross-validation is an essential tool in the machine learning workflow. It helps ensure that the model
is not just good at learning from the training data but also performs well on new, unseen data,
making it a reliable method for model evaluation
19.What is a True positive rate and a false positive rate?
1. True Positive Rate (TPR)
Also known as Sensitivity or Recall, it measures the proportion of actual positives that are correctly
identified by the model.

2. False Positive Rate (FPR)

It measures the proportion of actual negatives that are incorrectly identified as positives by the mode

20. What do you mean by a Bag of Words (BOW)?

• Bag of Words is a fundamental technique used in Natural Language Processing (NLP) to

represent text data in a numerical form so that machine learning algorithms can process it.
• In BOW, a text (like a sentence or document) is represented as a "bag" of its words, meaning
the order or grammar of words is completely ignored.
• It creates a vocabulary list of all unique words from the entire text dataset.
• Each text sample is then converted into a vector that records the frequency (count) of each
vocabulary word in that sample.
• For example, if the vocabulary contains the words ["cat", "dog", "love"], and a sentence is "I
love my dog," the BOW vector might look like [0,1,1], showing 0 occurrences of "cat," 1 of
"dog," and 1 of "love."
• This vector representation enables machine learning models to analyse text data numerically.
• Although simple and easy to implement, BOW does not capture the context or word order,
which can be a limitation.

21 . What type of node is considered Pure in the decision tree?

Node in Decision Tree:

✓ A pure node contains only samples from a single class (all data points belong to the same
category).
✓ It represents perfect classification at that node, so no further splitting is needed.
✓ Purity is measured using metrics like Gini index and Entropy, where a pure node has a value of
zero (no impurity).
✓ When a node is pure, it becomes a leaf node and is assigned the class label of its samples.
✓ Pure nodes help improve tree accuracy but very deep trees with many pure nodes may cause
overfitting.

29 . Explain some cases where k-Means clustering fails to give good results?

k-Means is an unsupervised learning algorithm used to partition data into k distinct clusters based on
similarity. It works by minimizing the distance between data points and their assigned cluster
centroids.

Failures:

➢ Non-spherical clusters: k-Means assumes clusters are spherical, so it performs poorly on

irregular shapes.
➢ Uneven sizes/densities: It misclassifies data when clusters vary in size or density.
➢ Outliers & poor initialization: Sensitive to outliers and initial centroid placement, which can
lead to incorrect clustering.

30 . What are the assumptions of linear regression?

a) Linearity: There is a linear relationship between the independent and dependent variables.
b) Homoscedasticity: The variance of errors is constant across all levels of the independent
variable(s).

37 . 'People who bought this also bought…' recommendations seen on Amazon are a result of which
algorithm?

Collaborative Filtering Algorithm:

This algorithm suggests items based on the preferences and behaviours of many users. For example, if
users who bought item A also bought item B, then item B will be recommended to others who buy
item A.

28 . What is the difference between Cost Function vs Gradient Descent?

Aspect Cost Function Gradient Descent

A function that measures the error or An optimization algorithm that minimizes

Definition difference between predicted and actual the cost function by updating model
values. parameters.

Quantify how well the model is Iteratively adjust parameters to reduce the
Purpose
performing. error.

A scalar value representing the model’s Updated model parameters after each
Output
error. iteration.

Role in
Provides a metric to optimize. The method used to minimize the metric.
Training

Mean Squared Error (MSE), Cross- Step-by-step parameter update using

Example
Entropy Loss. gradients.

Calculate error between predictions and Calculate gradient of cost function and
Process
actual targets. update weights accordingly.

To find parameters that minimize the cost

Goal To evaluate model performance.
function.

38 . How can you select k for k-means?

To select the best number of clusters k:

• Use the Elbow Method: Run k-means with different k values and plot the total within-cluster
sum of squares (inertia) against k. The point where the decrease in inertia slows down sharply
(forming an “elbow”) suggests the optimal k.

• Use the Silhouette Score: Calculate the average silhouette coefficient for different k values.
The k with the highest silhouette score indicates well-separated clusters.

39 . What is P-value?

• he P-value is the probability of observing data as extreme as (or more extreme than) the
sample results, assuming the null hypothesis is true.
• It helps measure the strength of evidence against the null hypothesis in a statistical test.
• A low P-value (usually < 0.05) suggests that the observed results are unlikely due to chance,
indicating statistical significance.
• A high P-value means there is insufficient evidence to reject the null hypothesis.
•

46 . Explain two different ways to detect outliers.?

Two Ways to Detect Outliers

1 Z-Score Method:
Calculate the Z-score for each value; points with Z-scores greater than 3 or less than -3 are
outliers because they lie far from the mean.

2 Interquartile Range (IQR) Method:

Calculate IQR as Q3 - Q1; points outside the range Q1−1.5×IQRQ1 - 1.5 \times
IQRQ1−1.5×IQR to Q3+1.5×IQRQ3 + 1.5 \times IQRQ3+1.5×IQR are outliers.

47 . What is SVM? Can you name some kernels used in SVM?

Support Vectors in SVMs:

• Support vectors are the data points closest to the decision boundary (hyperplane).

• They define the maximum margin — the widest gap between classes that the SVM tries to
achieve.

• The margin is the distance between two parallel lines on either side of the hyperplane, each
touching the nearest data points.

• Support vectors directly influence the position and orientation of the hyperplane.

• If a support vector is moved or removed, the hyperplane changes; other points have no effect.

• Mathematically, support vectors have non-zero Lagrange multipliers in the SVM

optimization problem.

• The model depends only on support vectors, making it robust and less sensitive to outliers
away from the margin.

• Using only support vectors reduces computational complexity and increases efficiency.

• Visually, support vectors lie on the margin boundaries parallel to the hyperplane.

• In summary, support vectors are the key points that “support” and define the optimal
separating hyperplane in SVM classification.

48 . Explain TF/IDF vectorization.?

For detailed explanation visit this page (https://www.geeksforgeeks.org/understanding-tf-idf-term-

frequency-inverse-document-frequency/)

TF-IDF (Term Frequency-Inverse Document Frequency)

TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used in natural
language processing and information retrieval to evaluate the importance of a word in a document
relative to a collection of documents (corpus).

TF-IDF combines two components: Term Frequency (TF) and Inverse Document Frequency (IDF).

Term Frequency (TF): Measures how often a word appears in a document. A higher frequency
suggests greater importance. If a term appears frequently in a document, it is likely relevant to the
document’s content. Formula:

Inverse Document Frequency (IDF): Reduces the weight of common words across multiple
documents while increasing the weight of rare words. If a term appears in fewer documents, it is more
likely to be meaningful and specific.

TF-IDF Calculation

TF-IDF score is calculated by multiplying these two statistics:

Vectorization

When transforming text to vectors using TF-IDF, each document is represented as a vector. Each
dimension of the vector corresponds to a separate term from the corpus vocabulary.

The value in each dimension is the TF-IDF score of the term in the document. To handle the vast size
of the vocabulary and the sparsity of individual document vectors, implementations usually use sparse
matrix representations.

PJC h2 Math p1 Solutions
No ratings yet
PJC h2 Math p1 Solutions
13 pages
SVM Using Python
No ratings yet
SVM Using Python
24 pages
SAP Security Tutorial
No ratings yet
SAP Security Tutorial
9 pages
08 Classification
No ratings yet
08 Classification
46 pages
AIDS2-QB-UT2
No ratings yet
AIDS2-QB-UT2
24 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
2 Mark Questions
No ratings yet
2 Mark Questions
13 pages
ML UNIT-4
No ratings yet
ML UNIT-4
20 pages
DS Notes
No ratings yet
DS Notes
36 pages
Lec 05
No ratings yet
Lec 05
54 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
MLT_Notes
No ratings yet
MLT_Notes
28 pages
ML2
No ratings yet
ML2
8 pages
Assessing a Single Classification Algorithm and Two Classification Algorithms
No ratings yet
Assessing a Single Classification Algorithm and Two Classification Algorithms
12 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Machine Learning
No ratings yet
Machine Learning
34 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
unit 6 ai
No ratings yet
unit 6 ai
28 pages
UNIT 1,2,3
No ratings yet
UNIT 1,2,3
17 pages
Machine Learning Midterm
No ratings yet
Machine Learning Midterm
18 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Module_2
No ratings yet
Module_2
5 pages
AIML
No ratings yet
AIML
30 pages
Final ML
No ratings yet
Final ML
2 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
1.Write the Formula for Sigmoid, Hyperbolic Tangen...
No ratings yet
1.Write the Formula for Sigmoid, Hyperbolic Tangen...
3 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
AI and DS QB1
No ratings yet
AI and DS QB1
31 pages
ML-UNIT-I
No ratings yet
ML-UNIT-I
14 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
ML CheatSheet
No ratings yet
ML CheatSheet
14 pages
End SEM V IMP DSE 2
No ratings yet
End SEM V IMP DSE 2
9 pages
Chapter 2 Machine Learning Draft-85-172
No ratings yet
Chapter 2 Machine Learning Draft-85-172
88 pages
FML - |||
No ratings yet
FML - |||
7 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
UNIT-3
No ratings yet
UNIT-3
12 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
ml_unit2
No ratings yet
ml_unit2
22 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
68 pages
BSC ML CH1.pptx
No ratings yet
BSC ML CH1.pptx
63 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Chapter3 Classification Summary Final
No ratings yet
Chapter3 Classification Summary Final
11 pages
AI Chapter 3 Part 3
No ratings yet
AI Chapter 3 Part 3
49 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Machine_Learning_One_Mark_Answers
No ratings yet
Machine_Learning_One_Mark_Answers
4 pages
Linear Regression & SVM
No ratings yet
Linear Regression & SVM
33 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
LFD-1
No ratings yet
LFD-1
39 pages
Machine Learning Term Test 2
No ratings yet
Machine Learning Term Test 2
20 pages
unit 4
No ratings yet
unit 4
8 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Heroes Of The American Revolution Mary Hertz Scarbrough instant download
100% (1)
Heroes Of The American Revolution Mary Hertz Scarbrough instant download
38 pages
AquaArm Tufcrete
No ratings yet
AquaArm Tufcrete
2 pages
2023-062R - Bid Docs Goods EPA - Food Stall (50 Units)
No ratings yet
2023-062R - Bid Docs Goods EPA - Food Stall (50 Units)
47 pages
The Art and Science of Composting
100% (5)
The Art and Science of Composting
17 pages
Ge02 Final-Exam
No ratings yet
Ge02 Final-Exam
1 page
RESEARCHHHH-VANNN-hapit Naaaguyuudddd
No ratings yet
RESEARCHHHH-VANNN-hapit Naaaguyuudddd
17 pages
Common Cold
No ratings yet
Common Cold
6 pages
10 Contoh Coding Bahasa Pemrograman Java
No ratings yet
10 Contoh Coding Bahasa Pemrograman Java
15 pages
Asme B 31 Details
No ratings yet
Asme B 31 Details
2 pages
Multipulse Stimulator D185 PDF
No ratings yet
Multipulse Stimulator D185 PDF
2 pages
Dhairya Gangwani
No ratings yet
Dhairya Gangwani
4 pages
CSEC_Math_Bearings_Questions
No ratings yet
CSEC_Math_Bearings_Questions
16 pages
Cardinal Mark
No ratings yet
Cardinal Mark
4 pages
AFL1502 2014 2017 Memo
No ratings yet
AFL1502 2014 2017 Memo
19 pages
Street of Eternal Happiness Big City Dreams Along a Shangha Unknown pdf download
100% (2)
Street of Eternal Happiness Big City Dreams Along a Shangha Unknown pdf download
36 pages
Standing Stones
No ratings yet
Standing Stones
10 pages
Complete Download Instructor s Solutions Manual for Galois Theory 4th Edition Ian Stewart PDF All Chapters
100% (6)
Complete Download Instructor s Solutions Manual for Galois Theory 4th Edition Ian Stewart PDF All Chapters
82 pages
How To Use Proxies For School To Unblock Websites
No ratings yet
How To Use Proxies For School To Unblock Websites
7 pages
Ademe Tamene
No ratings yet
Ademe Tamene
80 pages
Report On M&P
0% (1)
Report On M&P
25 pages
Introduction To SQL Class 1
No ratings yet
Introduction To SQL Class 1
21 pages
VMware NFS Best Practices WP en New
No ratings yet
VMware NFS Best Practices WP en New
23 pages
IPE Plate 3
No ratings yet
IPE Plate 3
29 pages
Walmart Report
No ratings yet
Walmart Report
17 pages
Certificate of Conformance: Qa@nouveaux - in WWW - Nouveaux.in
No ratings yet
Certificate of Conformance: Qa@nouveaux - in WWW - Nouveaux.in
1 page
Six Sigma Green Belt Material
No ratings yet
Six Sigma Green Belt Material
152 pages
Oracle Hardware ECCN Matrix
No ratings yet
Oracle Hardware ECCN Matrix
15 pages
Instruction Manual FT
No ratings yet
Instruction Manual FT
29 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

1. When should you use classification over regression?

2. Explain how a ROC curve works.?

3. What are Support Vectors in SVMs?

11 . What is bagging and boosting in Machine Learning?

• Bagging (Bootstrap Aggregating): It trains multiple models independently on different

12 . What is Cross-validation in Machine Learning?

• To check the model's performance on independent data.

• To avoid overfitting or underfitting.

o Split the data into two sets: training and testing.

o Simple but may not give consistent results if data is limited.

2. k-Fold Cross-Validation (Most common):

o The data is divided into k equal parts.

o Final result = average performance from all folds.

3. Stratified k-Fold Cross-Validation

o Similar to k-Fold but maintains the class distribution in each fold.

4. Leave-One-Out Cross-Validation (LOOCV):

o Very thorough but computationally expensive.

5. Time Series Cross-Validation:

o For time-dependent data.

• Provides a more accurate estimate of model performance.

• Reduces the chances of bias from one random train-test split.

• Helps in model selection and tuning hyperparameters.

• Can be computationally intensive, especially for large datasets.

2. False Positive Rate (FPR)

20. What do you mean by a Bag of Words (BOW)?

• Bag of Words is a fundamental technique used in Natural Language Processing (NLP) to

21 . What type of node is considered Pure in the decision tree?

➢ Non-spherical clusters: k-Means assumes clusters are spherical, so it performs poorly on

30 . What are the assumptions of linear regression?

Collaborative Filtering Algorithm:

28 . What is the difference between Cost Function vs Gradient Descent?

A function that measures the error or An optimization algorithm that minimizes

Mean Squared Error (MSE), Cross- Step-by-step parameter update using

To find parameters that minimize the cost

38 . How can you select k for k-means?

To select the best number of clusters k:

46 . Explain two different ways to detect outliers.?

2 Interquartile Range (IQR) Method:

47 . What is SVM? Can you name some kernels used in SVM?

Support Vectors in SVMs:

• Mathematically, support vectors have non-zero Lagrange multipliers in the SVM

48 . Explain TF/IDF vectorization.?

For detailed explanation visit this page (https://www.geeksforgeeks.org/understanding-tf-idf-term-

TF-IDF (Term Frequency-Inverse Document Frequency)

TF-IDF score is calculated by multiplying these two statistics:

You might also like