DL Unit 1
DL Unit 1
Artificial Intelligence (AI) refers to the development of computer systems of performing tasks that
require human intelligence. AI aids, in processing amounts of data identifying patterns and making
A decision tree is a supervised machine-learning algorithm that can be used for both classification
and regression problems. The algorithm builds its model in the structure of a tree along with
decision nodes and leaf nodes. A decision tree is a series of sequential decisions made to reach a
specific result. Here’s an illustration of a decision tree in action (using our above example):
First, it checks if the customer has a good credit history. Based on that, it classifies the customer
into two groups: customers with good credit history and customers with bad credit history. Then, it
checks the customer’s income and again classifies him/her into two groups. Finally, it checks the
loan amount requested by the customer. Based on the outcomes from checking these three features,
the decision tree decides whether the customer’s loan should be approved.
1
The features/attributes and conditions can change based on the data and complexity of the problem,
but the overall idea remains the same. So, a decision tree makes a series of decisions based on a set
of features/attributes present in the data: credit history, income, and loan amount.
Random Forest is a tree-based machine-learning algorithm that leverages the power of multiple
decision trees to make decisions. As the name suggests, it is a “forest” of trees!
But why do we call it a “random” forest? That’s because it is a forest of randomly created decision
trees. Each node in the decision tree works on a random subset of features to calculate the output.
The random forest then combines the output of individual decision trees to generate the final output.
Bootstrapping is the process of randomly selecting items from the training dataset. This is a
haphazard technique. It assembles randomized decisions based on several decisions and makes the
final decision based on the majority voting.
In simple words:
The Random Forest Algorithm combines the output of multiple (randomly created) Decision Trees
to generate the final output.
2
This process of combining the output of multiple individual models (also known as weak learners)
is called Ensemble Learning
Bias-Variance Trade-
Lower variance, reduced overfitting Higher variance, prone to overfitting
off
Robustness More robust to outliers and noise Sensitive to outliers and noise
Unsupervised learning with random forest is done by constructing a joint distribution based
on your independent variables that roughly describes your data.
Then simulate a certain number of observations using this distribution.
3
For example if you have 1000 observations you could simulate 1000 more. Then you label
them, e.g. 1:= real observation, 0:=simulated observation.
After this, you run a usual random forest classifier trying to distinguish the real observations
from the simulated ones.
Note that you must have the calculate the proximity option turned on.
The real useful output is exactly this, a description of proximity between your observations
based on what Random Forest does when trying to assign these labels.
You now have a description of how "close" or "similar" your observations are from each
other and you could even cluster them based on many techniques.
A straightforward one would be to select thresholds for these "distances". I mean stick
together observations that are closer than a certain threshold.
Another easy option is to do hierarchical clustering but using this particular distance matrix.
If you can work with R, most hierarchical clustering packages allow you to feed the
functions custom distance matrices.
You then select a cutoff point, you may visualize it as a dendrogram and so on and so forth.
This used to be a very good tutorial on Random Forest clustering and they shared some
useful R functions which they wrote for this purpose but the link seems to be dead now.
Maybe it will come back up later. They also wrote a very neat random glm R package
(which is analogous to random forest but based on duh...glms) if you want to check that out.
4
3a) What are kernel methods in Deep learning? Explain.
Kernel methods' fundamental premise is used to convert the input data into a high-dimensional
feature space, which makes it simpler to distinguish between classes or generate predictions. Kernel
methods employ a kernel function to implicitly map the data into the feature space, as opposed to
manually computing the feature space.
The most popular kind of kernel approach is the Support Vector Machine (SVM), a binary
classifier that determines the best hyperplane that most effectively divides the two groups. In order
to efficiently locate the ideal hyperplane, SVMs map the input into a higher-dimensional space
using a kernel function.
Other examples of kernel methods include kernel ridge regression, kernel PCA, and Gaussian
processes. Since they are strong, adaptable, and computationally efficient, kernel approaches are
frequently employed in machine learning. They are resilient to noise and outliers and can handle
sophisticated data structures like strings and graphs.
The kernel function in SVMs is essential in determining the decision boundary that divides the
various classes. In order to calculate the degree of similarity between any two points in the feature
space, the kernel function computes their dot product.
The most commonly used kernel function in SVMs is the Gaussian or radial basis function (RBF)
kernel. The RBF kernel maps the input data into an infinite-dimensional feature space using a
Gaussian function. This kernel function is popular because it can capture complex nonlinear
relationships in the data.
5
Other types of kernel functions that can be used in SVMs include the polynomial kernel, the
sigmoid kernel, and the Laplacian kernel. The choice of kernel function depends on the specific
problem and the characteristics of the data.
Basically, kernel methods in SVMs are a powerful technique for solving classification and
regression problems, and they are widely used in machine learning because they can handle
complex data structures and are robust to noise and outliers.
o Mercer's condition: A kernel function must satisfy Mercer's condition to be valid. This
condition ensures that the kernel function is positive semi definite, which means that it is
always greater than or equal to zero.
o Positive definiteness: A kernel function is positive definite if it is always greater than zero
except for when the inputs are equal to each other.
o Non-negativity: A kernel function is non-negative, meaning that it produces non-negative
values for all inputs.
o Symmetry: A kernel function is symmetric, meaning that it produces the same value
regardless of the order in which the inputs are given.
o Reproducing property: A kernel function satisfies the reproducing property if it can be
used to reconstruct the input data in the feature space.
o Smoothness: A kernel function is said to be smooth if it produces a smooth transformation
of the input data into the feature space.
o Complexity: The complexity of a kernel function is an important consideration, as more
complex kernel functions may lead to over fitting and reduced generalization performance.
Types of SVM
SVM can be of two types:
6
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can
be classified into two classes by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if
a dataset cannot be classified by using a straight line, then such data is termed as non-linear
data and classifier used is called as Non-linear SVM classifier.Hyperplane and Support
Vectors in the SVM algorithm:
The dimensions of the hyperplane depend on the features present in the dataset, which means if
there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3
features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum distance
between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of the
hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a
Support vector.
The working of the SVM algorithm can be understood by using an example. Suppose we have a
dataset that has two tags (green and blue), and the dataset has two features x1 and x2. We want a
classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider the below
image:
7
So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But
there can be multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both the
classes. These points are called support vectors. The distance between the vectors and the
hyperplane is called as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
8
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data,
we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we have used
two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated
as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
3b) Explain the terms over fitting and Under fitting In ML.
Overfitting and Underfitting are the two main problems that occur in machine learning and degrade
the performance of the machine learning models.
9
The main goal of each machine learning model is to generalize well. Here generalization defines
the ability of an ML model to provide a suitable output by adapting the given set of unknown input.
It means after providing training on the dataset, it can produce reliable and accurate output. Hence,
the underfitting and overfitting are the two terms that need to be checked for the performance of the
model and whether the model is generalizing well or not.
Before understanding the overfitting and underfitting, let's understand some basic term that will help
to understand this topic well:
o Signal: It refers to the true underlying pattern of the data that helps the machine learning
model to learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance of the model.
o Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the
machine learning algorithms. Or it is the difference between the predicted values and the
actual values.
o Variance: If the machine learning model performs well with the training dataset, but does
not perform well with the test dataset, then variance occurs.
Overfitting
Overfitting occurs when our machine learning model tries to cover all the data points or more than
the required data points present in the given dataset. Because of this, the model starts caching noise
and inaccurate values present in the dataset, and all these factors reduce the efficiency and accuracy
of the model. The overfitted model has low bias and high variance.
The chances of occurrence of overfitting increase as much we provide training to our model. It
means the more we train our model, the more chances of occurring the overfitted model.
Example: The concept of the overfitting can be understood by the below graph of the linear
regression output:
10
As we can see from the above graph, the model tries to cover all the data points present in the
scatter plot. It may look efficient, but in reality, it is not so. Because the goal of the regression
model to find the best fit line, but here we have not got any best fit, so, it will generate the
prediction errors.
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
Underfitting
Underfitting occurs when our machine learning model is not able to capture the underlying trend of
the data. To avoid the overfitting in the model, the fed of training data can be stopped at an early
11
stage, due to which the model may not learn enough from the training data. As a result, it may fail to
find the best fit of the dominant trend in the data.
In the case of underfitting, the model is not able to learn enough from the training data, and hence it
reduces the accuracy and produces unreliable predictions.
Example: We can understand the underfitting using below output of the linear regression model:
As we can see from the above diagram, the model is unable to capture the data points present in the
plot.
Goodness of Fit
The "Goodness of fit" term is taken from the statistics, and the goal of the machine learning models
to achieve the goodness of fit. In statistics modeling, it defines how closely the result or predicted
values match the true values of the dataset.
12
The model with a good fit is between the underfitted and overfitted model, and ideally, it makes
predictions with 0 errors, but in practice, it is difficult to achieve it.
6a) Explain how random forests give output for classification and regression problems.
Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the predictive accuracy of
that dataset." Instead of relying on one decision tree, the random forest takes the prediction from
each tree and based on the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem
of overfitting.
The below diagram explains the working of the Random Forest algorithm:
13
Note: To better understand the Random Forest Algorithm, you should have knowledge of the
Decision Tree Algorithm.
Since the random forest combines multiple trees to predict the class of the dataset, it is possible that
some decision trees may predict the correct output, while others may not. But together, all the trees
predict the correct output. Therefore, below are two assumptions for a better Random forest
classifier:
o There should be some actual values in the feature variable of the dataset so that the classifier
can predict accurate results rather than a guessed result.
o The predictions from each tree must have very low correlations.
Below are some points that explain why we should use the Random Forest algorithm:
o It predicts output with high accuracy, even for the large dataset it runs efficiently.
Random Forest works in two-phase first is to create the random forest by combining N decision
tree, and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps and diagram:
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
14
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new data
points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is given to
the Random forest classifier. The dataset is divided into subsets and given to each decision tree.
During the training phase, each decision tree produces a prediction result, and when a new data
point occurs, then based on the majority of results, the Random Forest classifier predicts the final
decision. Consider the below image:
15
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can be
identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
o It enhances the accuracy of the model and prevents the overfitting issue.
o Although random forest can be used for both classification and regression tasks, it is not
more suitable for Regression tasks.
Random forest is one of the most popular algorithms for regression problems (i.e. predicting
continuous outcomes) because of its simplicity and high accuracy. In this guide, we’ll give you a
gentle introduction to random forest and the reasons behind its high popularity.
Let’s start with an actual problem. Imagine you want to buy real estate, and you want to figure out
what comprises a good deal so that you don’t get taken advantage of.
The obvious thing to do would be to look at historic prices of houses sold in the area, then create
some kind of decision criteria to summarize the average selling prices given the real-estate
specification. You can use the decision chart to evaluate whether the listed price for the apartment
you are considering is a bargain or not. It could look like this:
16
The chart represents a decision tree through a series of yes/no questions, which lead you from the
real-estate description (“3 bedrooms”) to its historic average price. You can use the decision tree to
predict what the expected price of a real estate would be, given its attributes.
However, you could come up with a distinctly different decision tree structure:
17
This would also be a valid decision chart, but with totally different decision criteria. These decisions
are just as well-founded and show you information that was absent in the first decision tree.
The random forest regression algorithm takes advantage of the ‘wisdom of the crowds’. It takes
multiple (but different) regression decision trees and makes them ‘vote’. Each tree needs to predict
the expected price of the real estate based on the decision criteria it picked. Random forest
regression then calculates the average of all of the predictions to generate a great estimate of what
the expected price for a real estate should be.
Random forest regression is used to solve a variety of business problems where the company needs
to predict a continuous value:
1. Predict future prices/costs. Whenever your business is trading products or services (e.g. raw
materials, stocks, labors, service offerings, etc.), you can use random forest regression to
predict what the prices of these products and services will be in the future.
2. Predict future revenue. Use random forest regression to model your operations. For example,
you can input your investment data (advertisement, sales materials, cost of hours worked on
long-term enterprise deals, etc.) and your revenue data, and random forest will discover the
connection between the input and output
3. Compare performance. Imagine that you’ve just launched a new product line. The problem
is, it’s unclear whether the new product is attracting more (and higher spending) customers
than the existing product line. Use random forest regression to determine how your new
product compares to your existing ones.
Random forest regression is extremely useful in answering interesting and valuable business
questions, but there are additional reasons why it is one of the most used machine learning
algorithms.
1.3 What are the advantages of random forest for real production applications?
Random forest regression is a popular algorithm due to its many benefits in production settings:
18
1. Extremely high accuracy. Thanks to its ‘wisdom of the crowds’ approach, random forest
regression achieves extremely high accuracies. It usually produces better results than other
linear models, including linear regression and logistic regression.
2. Scales well. Computationally, the algorithm scales well when new features or samples are
added to the dataset.
4. Easy to use. Random forest works with both categorical and numerical input variables, so
you spend less time one-hot encoding or labeling data. It’s not sensitive to missing data, and
it can handle outliers to a certain extent.
19
Property Random Forest Decision Tree
20