Linear Regression
Ankita Bhanushali
Pursuing M.Sc. Statistics
Consultant, Kantar
INSAID , March 2020 GCD
Cohort.
Linear Regression
Regression analysis is one of the most widely used methods for prediction. It is
applied whenever we have a causal relationship between variables.
Regression Analysis
We will use our typical step-by-step approach. We’ll start with the simple linear
regression model.
What is a Linear Regression
Let’s start with some dry theory. A linear regression is a linear approximation of a causal
relationship between two or more variables.
Ankita Bhanushali | Kantar Page 1
Linear Regression
Regression models are highly valuable, as they are one of the most common ways to
make inferences and predictions.
There is a dependent variable, labeled Y, being predicted, and independent variables,
labeled x1, x2, and so forth. These are the predictors. Y is a function of the X variables,
and the regression model is a linear approximation of this function.
Ankita Bhanushali | Kantar Page 2
Linear Regression
The Simple Linear Regression
The easiest regression model is the simple linear regression:
Y = β 0 + β 1 * x 1 + ε.
Let’s see what these values mean. Y is the variable we are trying to predict and is called
the dependent variable. X is an independent variable.
Ankita Bhanushali | Kantar Page 3
Linear Regression
The Regression Line
You may have heard about the regression line, too. When we plot the data points on
an x-y plane, the regression line is the best-fitting line through the data points. You can
take a look at a plot with some data points in the picture above. We plot the line based
on the regression equation. The grey points that are scattered are the observed
values. B 0 , as we said earlier, is a constant and is the intercept of the regression line with
the y-axis.B 1 is the slope of the regression line. It shows how much y changes for each
unit change of x.
Ankita Bhanushali | Kantar Page 4
Linear Regression
The Estimator of the Error
The distance between the observed values and the regression line is the estimator of the
error term epsilon. Its point estimate is called residual. Now, suppose we draw a
perpendicular from an observed point to the regression line. The intercept between that
perpendicular and the regression line will be a point with a y value equal to . As we said
earlier, given an x, is the value predicted by the regression line.
Ankita Bhanushali | Kantar Page 5
Linear Regression
Linear Regression in Python Example
We believe it is high time that we actually got down to it and wrote some code! So, let’s
get our hands dirty with our first linear regression example in Python.
Understanding the Dataset
Before we get started with the Python linear regression hands-on, let us explore the dataset. We
will be using the Boston House Prices Dataset, with 506 rows and 13 attributes with a target
column. Let’s take a quick look at the dataset.
Let’s take a quick look at the dataset.
In this Python Linear Regression example, we will train two models to predict the price.
Ankita Bhanushali | Kantar Page 6
Linear Regression
Model Building
Now that we are familiar with the dataset, let us build the Python linear regression models.
Simple Linear Regression in Python
Consider ‘lstat’ as independent and ‘medv’ as dependent variables
Step 1: Load the Boston dataset
Step 2: Have a glance at the shape
Ankita Bhanushali | Kantar Page 7
Linear Regression
Step 3: Have a glance at the dependent and independent variables
Step 4: Visualize the change in the variables
Ankita Bhanushali | Kantar Page 8
Linear Regression
Step 5: Divide the data into independent and dependent variables
Step 6: Split the data into train and test sets
Step 7: Shape of the train and test sets
Step 8: Train the algorithm
Ankita Bhanushali | Kantar Page 9
Linear Regression
Step 9: Retrieve the intercept
Step 10: Retrieve the slope
Step 11: Predicted value
Ankita Bhanushali | Kantar Page 10
Linear Regression
Step 12: Actual value
Step 13: Evaluate the algorithm
Ankita Bhanushali | Kantar Page 11
Linear Regression
What Did We Learn?
We embarked on it by first learning about what a linear regression is. Then, we went over the
process of creating one. We also went over a linear regression example. Afterwards, we talked
about the simple linear regression where we introduced the linear regression equation. By
then, we were done with the theory and got our hands on the keyboard and explored
another linear regression example in Python! We imported the relevant libraries and loaded the
data. We cleared up when exactly we need to create regressions and started creating our own.
The process consisted of several steps which, now, you should be able to per form with ease.
Ankita Bhanushali | Kantar Page 12