228w1f0065 ML
228w1f0065 ML
BY
VARRI ASHOK KUMAR
(228W1F0065)
Regression Analysis in Machine Learning
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables.
More specifically, Regression analysis helps us to understand how the value of
the dependent variable is changing corresponding to an independent variable
when other independent variables are held fixed.
It predicts continuous/real values such as temperature, age, salary, price, etc.
Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous output
variable based on the one or more predictor variables.
It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
Regression shows a line or curve that passes through all the datapoints on
target-predictor graph in such a way that the vertical distance between the
datapoints and the regression line is minimum.
Prediction of rain using temperature and other factors
Determining Market trends
Prediction of road accidents due to rash driving are the examples of Regression.
Terminologies Related to the Regression
Analysis
Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
Independent Variable: The factors which affect the dependent variables or which
are used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so
it should be avoided.
Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
Underfitting and Overfitting: If our algorithm works well with the training dataset
but not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.
Types of Regression
There are various types of regressions which are used in data science
and machine learning.
Each type has its own importance on different scenarios, but at the
core, all the regression methods analyze the effect of the independent
variable on dependent variables.
Linear Regression
Logistic Regression
Polynomial Regression
Linear Regression
Linear regression is a statistical regression method which is used for
predictive analysis.
It is one of the very simple and easy algorithms which works on regression
and shows the relationship between the continuous variables.
It is used for solving the regression problem in machine learning.
Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear regression.
The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
Some popular applications of linear regression
are:
•Salary forecasting
It uses the concept of threshold levels, values above the threshold level are rounded
up to 1, and values below the threshold level are rounded up to 0.
There are three types of logistic regression:
Binary(0/1, pass/fail)
Multi(cats, dogs, lions)
Ordinal(low, medium, high)
Polynomial Regression
Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
It is similar to multiple linear regression, but it fits a non-linear curve between the
value of x and corresponding conditional values of y.
Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.
In Polynomial regression, the original features are transformed into polynomial
features of given degree and then modeled using a linear model. Which means the
datapoints are best fitted using a polynomial line.
The equation for polynomial regression also derived from linear regression
equation that means Linear regression equation Y= b0+ b1x, is transformed into
Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
Here Y is the predicted/target output, b0, b1,... bn are the regression coefficients. x is
our independent/input variable.
The model is still linear as the coefficients are still linear with quadratic.
This is different from Multiple Linear regression in such a way that in Polynomial
regression, a single element has different degrees instead of multiple variables with the
same degree.
Applications :
To predict the spread rate of COVID-19 and other infectious diseases.
To capture non-linear relationships between variables by fitting a non-linear regression
line, which may not be possible with simple linear regression.
Comparison Between Regression and classification
Comparison Between Regression vs Classification vs Clustering
Linear vs Logistic vs Polynomial Regression