0% found this document useful (0 votes)

3 views12 pages

Linear Regression Notes

Linear regression is a statistical method that estimates the relationship between one independent variable (height) and one dependent variable (weight) using a straight line. The document provides instructions for performing linear regression in Excel using the Analysis ToolPak, including how to interpret the output results such as the correlation coefficient, R square, and residuals. Additionally, it explains how to visualize the data and assess the model's accuracy through various plots.

Uploaded by

Khalid Obad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views12 pages

Linear Regression Notes

Uploaded by

Khalid Obad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Linear regression Notes

What is a linear regression?

a regression model that estimates the relationship between one independent variable and
one dependent variable using a straight line.

Example:
We have measured the weight and the heights 11 different participants; each row
represents a different participant. Each variable is measured as:

• Weight (kg) (Y Range - dependent variable)

• Height (cm) (X Range - Independent variable)

So, we want to determine if there is a relation between the weight (dependent variable) and
the height (independent variable) using linear regression to see how well the measures of
height in my sample can predict the measures of weight.

Installing the Analysis ToolPak

There are a few ways you can perform a linear regression in Excel, but perhaps the easiest
method is to use the Analysis ToolPak. This is an add-on created by Microsoft to provide
data analysis tools for statistical analyses.

Here are the instructions for installing the Analysis Toolpak:

Linear regression Notes

1. Go to File>Options
2. Then click on Add-ins
3. At the bottom, you want to manage the Excel add-ins and click the Go button
4. Then, ensure you tick the Analysis ToolPak add-in, and click OK.

Now, when you click on the Data ribbon, you should see a Data Analysis button in a sub-
section called Analyze

Performing the linear regression in Excel

To perform the linear regression, click on the Data Analysis button.

Then, select Regression from the list.

You must then enter the following:

• Input Y Range – this is the data for the Y variable, otherwise known as the
dependent variable. The Y variable is the one that you want to predict in the
regression model. In this example it will be the weight data
• Input X Range – this is the data for the X variable, otherwise known as the
independent variable. In this example it will be the height data.

• The next option called Constant is Zero is used if you want the regression line to
start at 0, otherwise known as the origin. Doing so would mean there is no Y
intercept in the model. Generally, for linear regression, this option is not selected,
so just leave it unchecked for this example.

It is also possible to specify the confidence level for the test. By default, the
results will return the 95% confidence intervals without having to change any
options.
Linear regression Notes

Output options

For the Output Options, you can specify where you want the regression results to be
placed.

• Output Range – you can highlight where you want the results to be placed in
that worksheet
• New Worksheet Ply – lets you place the results in a new worksheet
• New Workbook – lets you save the results in an entirely separate workbook

Residuals

The final set of options concerns the residuals in the analysis.

• Residuals – will return the list of predicted dependent values, based on the
regression line, as well as the residual values for each point
• Standardized Residuals – will return the standardized residuals; these values
can be useful when identifying potential outliers
• Residual Plots – will create a scatter graph where the residuals are plotted on
the Y axis and the X variable is plotted on the X axis
• Line Fit Plots – will create another scatter graph where the Y and X variables
are plotted, but it will also add the predicted Y values onto the graph
• Normal Probability Plots- option plots another scatter plot, which is used to
determine whether the Y variable data fits a normal distribution.

Interpretation of the linear regression results

The results are generated into the following:

• Summary Output table

• ANOVA table
• Coefficients table
• Residual Output table
• Residual plot
• Standardized Residuals
Linear regression Notes

• Line Fits plot

• Normal Probability plot

Summary Output table

In the first table called Summary Output, there are some regression statistics from the
test.

Multiple R

This is the absolute value of the correlation coefficient between the two variables of
interest. Briefly, it is a value that tells you how strong the linear relationship is.

A value of 0.65 in this case indicates a fairly strong linear correlation between height and
weight measures.
Linear regression Notes

This single value can tell us two important factors about the correlation:

• Direction
• Strength/magnitude

The correlation coefficient value can be any number between –1 and +1; and it has no
units on measure.

• Perfectly positive correlation: r=1

• Perfectly negative correlation: r=-1
• No correlation: r=0

Correlation coefficient (r) Interpretation

0.00–0.10 No correlation

0.10–0.39 Weak correlation

0.40–0.69 Moderate correlation

0.70–0.89 Strong correlation

0.90–1.00 Very strong correlation

Linear regression Notes

R square

The coefficient of determination (R2) indicates the amount of variance shared between
the two variables.

R2 is an absolute value that is always between 0 and 1.

To interpret the coefficient of determination better, it is more convenient to multiply it by
100 to convert it to a percentage.

We can say that 91.33% of the variability in weight is explained by the variability in
height.
The other 8.67% of the variance is explained by other factors that were not measured in
the experiment, such as measurement errors.

Adjusted R square

The adjusted R square takes into account the number of independent variables in the
regression analysis, and corrects for bias.

Usually, this value is only relevant when you are performing multiple linear regression,
where there are more than 1 independent variables in the model.
Linear regression Notes

Standard error

The standard error of the regression is the average distance that the observed values fall
from the regression line.

The smaller the standard error, the more precise the linear regression model is.

Observations

This is just the number of subjects in the test.

ANOVA table

The main thing you will be concerned with when looking at this table is the value under
the Significance F header; this is in fact the P value for the regression model.

To be able to interpret this, we need our hypotheses:

• Null hypothesis – there is no linear relationship between the height and weight
measures
• Alternative hypothesis – there is a linear relationship between the height and
weight measures

If my alpha was 0.05, this means I will reject the null and accept the alternative
hypothesis if P≤0.05. The opposite will be true if P>0.05; in this case, I would fail to
reject the null hypothesis.

As you can see, the P value (Significance F) for the model was considerably lower than
my alpha value of 0.05. So, I can conclude that the linear regression model is significant.

Coefficients table

Let me now move on to the final table of results regarding the coefficients.
Linear regression Notes

The first row displays the results for the intercept, this is the point where the line of best
fit (regression line) crosses the Y axis when the value of X is zero.

The second row displays the results for the slope.

For a simple linear regression model, the most basic version of the equation is Y = m.X +
b.

Using the information reported from the results, we can then say:

Y = 0.800264.X – 79.599

So, in this example, if we knew a participants height (in cm), we can predict their weight
(in kg) by using this equation. For example, if a participant measured 175 cm, the model
estimates their height to be 60.45 kg.

Looking back at the coefficient results table, we can see there are other columns which
tells us the standard error, as well as the lower and upper 95% confidence intervals, or a
different confidence interval if a different confidence level was entered. And these values
are for the intercept and slope values.

You will also notice each also has a T-statistic. This value is used to compute the P value.

Residual options

So, that’s an overview of the regression model results, let me know cover the other
outputs from the regression test.
Linear regression Notes

Residual Output

If you selected to have the Residuals option during the regression set-up, you will have a
table titled Residual Output.

For each observation from your data that was entered into the regression test, you will get
a predicted value of Y based on the regression model.

For example, if you look at the first observation in original data, you see this participant
had a height of 167.08 cm. If I put this into the regression equation, along with the slope
and intercept values, I get the predicted weight value of 54.10999 kg.

This is what the Predicted column represents; Excel does this for each of the
observations.

Using the predicted values, Excel can then calculate the residuals.

A residual is simply the distance between the actual data point and the line of best fit.

For my first participant they had a height of 167.08 cm and a weight of 51.24 kg. As
calculated above, the predicted weight value based on the model was 54.10999 kg. The
residual for this point therefore is the difference between the actual weight value (51.24
kg), and the predicted weight value (54.10999 kg), which comes out at around -2.867 kg.

Excel then repeats this process for the rest of the observations.
Linear regression Notes

Residual Plot

If you also selected the Residual Plots option in the Regression set-up window, you will
also get a graph returned.

This is a scatter plot of the residuals on the Y axis and the values of the independent
variable on the X axis.

Residual plots are useful to look at when investigating homogeneity of variance, which is
an assumption of the linear regression test.

Standardized Residuals

If you selected the Standardized Residuals option in the regression options, you will also
see a column called Standard Residuals in the residuals table.

Normal Probability plot

Finally, if you selected the Normal Probability plots option in the regression setup
window, you will also see a table called Probability Output and a graph, called the
Normal Probability Plot, which is a scatter plot of this data in the graph.
Linear regression Notes

The X axis plots the percentile value ranging from 0 to 100 and the Y axis plots the Y
variable data.

The normal probability plot is used to determine whether the data fits a normal
distribution.

Essentially, what you are looking for is a straight line of data. And, as you can see, there
is a nice straight line of data for my example, which suggests the weight data are
normally distributed.

The standardized residual is the residual divided by an estimate of its standard deviation.
You can think of them as Z scores.

These values are useful to look at when trying to identify potential outliers in your
sample.

Generally, any standardized residuals with a value greater than 3 or -3 is a sign that it
may be an outlier.
Linear regression Notes

Line Fits Plot

If you selected to have the Line Fit Plots option, you will also see a scatter plot
containing the data that was entered into the regression test.

In my example, I have the height measures on the X axis and the weight measures on the
Y axis.

There is also another set of data, as shown in orange here, which are in fact the predicted
Y value based on the model. These are the Predicted values from the residuals table.

If instead of showing the Predicted values on the graph, but you instead wanted to plot
the line of best fit (which will pass through the predicted values), then you could remove
the predicted values from the graph.

Simple Linear Regression 2023
No ratings yet
Simple Linear Regression 2023
33 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Simple Liner Regression
No ratings yet
Simple Liner Regression
9 pages
@regression
No ratings yet
@regression
33 pages
LGT2425 Lecture 3 Part II (Notes)
No ratings yet
LGT2425 Lecture 3 Part II (Notes)
55 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Excel Multiple Regression Guide
No ratings yet
Excel Multiple Regression Guide
3 pages
Canela (2019) Coeficiente de Correlación
No ratings yet
Canela (2019) Coeficiente de Correlación
9 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Linear Regression Analysis in Excel 2
No ratings yet
Linear Regression Analysis in Excel 2
15 pages
MS Excel Linear & Multiple Regression
No ratings yet
MS Excel Linear & Multiple Regression
8 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
Linear Regression Analysis in Excel
No ratings yet
Linear Regression Analysis in Excel
15 pages
Simple Liner REgression
No ratings yet
Simple Liner REgression
27 pages
Linear Regression Analysis in Excel
No ratings yet
Linear Regression Analysis in Excel
17 pages
Linear Regression Analysis in Excel
No ratings yet
Linear Regression Analysis in Excel
17 pages
P4-FDA-B29-Monish Patle
No ratings yet
P4-FDA-B29-Monish Patle
14 pages
Linear Regression Analysis in Excel Assingment
No ratings yet
Linear Regression Analysis in Excel Assingment
17 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Simple Linear Regression: Coefficient of Determination
No ratings yet
Simple Linear Regression: Coefficient of Determination
21 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
Excel Regression for Beginners
No ratings yet
Excel Regression for Beginners
39 pages
MS - Excel - Linear - & - Multiple - Regression Office 2007
No ratings yet
MS - Excel - Linear - & - Multiple - Regression Office 2007
7 pages
How To Do Linear Regression With Excel
No ratings yet
How To Do Linear Regression With Excel
8 pages
Iml Exp. 3
No ratings yet
Iml Exp. 3
4 pages
Multiple Linear Regression Analysis Usin
No ratings yet
Multiple Linear Regression Analysis Usin
19 pages
Multiple Linear Regression in Excel
No ratings yet
Multiple Linear Regression in Excel
19 pages
Share MBBS Lecture 5 (1) - 1
No ratings yet
Share MBBS Lecture 5 (1) - 1
40 pages
Interpreting The Regression Output From Excel
No ratings yet
Interpreting The Regression Output From Excel
2 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Regression PDF
No ratings yet
Regression PDF
18 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
Linear Regression Assignment
No ratings yet
Linear Regression Assignment
49 pages
F Regression
No ratings yet
F Regression
65 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Regression Analysis Using Excel
100% (1)
Regression Analysis Using Excel
85 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
18 SL Regression 1 320E F21
No ratings yet
18 SL Regression 1 320E F21
40 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
10 Regression Analysis
No ratings yet
10 Regression Analysis
55 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
In Statistical Regression
No ratings yet
In Statistical Regression
16 pages
DSI Guide Linear Regression 1677259638
No ratings yet
DSI Guide Linear Regression 1677259638
17 pages
Excel Regression for Beginners
No ratings yet
Excel Regression for Beginners
8 pages
ML Assignment No. 1: 1.1 Title
No ratings yet
ML Assignment No. 1: 1.1 Title
8 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
12 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
Regression Interpretation
No ratings yet
Regression Interpretation
4 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
05 Linear Regression 2
No ratings yet
05 Linear Regression 2
71 pages
R Programming Exam With Solutions
No ratings yet
R Programming Exam With Solutions
9 pages
Space Engineering: Control Performance
No ratings yet
Space Engineering: Control Performance
57 pages
Flooding Capacity in Packed Towers: Database, Correlations, and Analysis
No ratings yet
Flooding Capacity in Packed Towers: Database, Correlations, and Analysis
12 pages
Forecasting
No ratings yet
Forecasting
99 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
Lecture03 MachineLearning
No ratings yet
Lecture03 MachineLearning
78 pages
Surveying by DR Ramachandra
No ratings yet
Surveying by DR Ramachandra
338 pages
Valorisation of Spent Coffee Grounds Production of Biodiesel Via Enzimatic Catalisis With Ethanol and Co-Solvent
No ratings yet
Valorisation of Spent Coffee Grounds Production of Biodiesel Via Enzimatic Catalisis With Ethanol and Co-Solvent
14 pages
Rahmawati & Mildawati (2017)
No ratings yet
Rahmawati & Mildawati (2017)
19 pages
Optimisation of Mixing Performance of Helical Ribbon Mixers For High Throughput Applications Using Computational Uid Dynamics
No ratings yet
Optimisation of Mixing Performance of Helical Ribbon Mixers For High Throughput Applications Using Computational Uid Dynamics
12 pages
Predicting Membrane Cleaning Effectiveness in A Full-Scale Water Treatment
No ratings yet
Predicting Membrane Cleaning Effectiveness in A Full-Scale Water Treatment
15 pages
Chetty Et Al. (2011) ,'how Does Your Kindergarten Classroom Effect Your Earnings' PDF
No ratings yet
Chetty Et Al. (2011) ,'how Does Your Kindergarten Classroom Effect Your Earnings' PDF
89 pages
Exp 1 - Error Anaylysis and Graph Drawing - Theory
No ratings yet
Exp 1 - Error Anaylysis and Graph Drawing - Theory
9 pages
GMM
No ratings yet
GMM
20 pages
Applied Econometrics Lecture 1: Introduction
No ratings yet
Applied Econometrics Lecture 1: Introduction
34 pages
Impact of Digital Marketing On Consumer Purchase Behaviour
No ratings yet
Impact of Digital Marketing On Consumer Purchase Behaviour
29 pages
Minitab Guide: Exponential Smoothing
No ratings yet
Minitab Guide: Exponential Smoothing
14 pages
Application Guide HYMOS PDF
No ratings yet
Application Guide HYMOS PDF
164 pages
Graham & Santangelo 2014 - Meta-Analysis Spelling Instruction
No ratings yet
Graham & Santangelo 2014 - Meta-Analysis Spelling Instruction
42 pages
Econometrics II
100% (1)
Econometrics II
101 pages
The Creation and Application of A Geotechnical Block Model For An Underground Mining Project
No ratings yet
The Creation and Application of A Geotechnical Block Model For An Underground Mining Project
14 pages
GAPIT Manual
No ratings yet
GAPIT Manual
50 pages
Markov Switching Model Tool
No ratings yet
Markov Switching Model Tool
39 pages
Model Sum of Squares DF Mean Square F Sig. 1 Regression 49.210 4 12.302 45.969 .000 Residual 12.846 48 .268 Total 62.056 52 A. Predictors: (Constant), LC, TANG, DEBT, EXT B. Dependent Variable: DPR
No ratings yet
Model Sum of Squares DF Mean Square F Sig. 1 Regression 49.210 4 12.302 45.969 .000 Residual 12.846 48 .268 Total 62.056 52 A. Predictors: (Constant), LC, TANG, DEBT, EXT B. Dependent Variable: DPR
3 pages
BUSI 2013 Unit 1-10 Notes
No ratings yet
BUSI 2013 Unit 1-10 Notes
10 pages
Econ7020X 2024S FinalExam
No ratings yet
Econ7020X 2024S FinalExam
10 pages
Regression Analysis KoyaAkhilReddy
No ratings yet
Regression Analysis KoyaAkhilReddy
2 pages
Wilm1944-Tecnica Del Doble Muestreo
No ratings yet
Wilm1944-Tecnica Del Doble Muestreo
10 pages
Output Uji Pair Wise Correlation - UAS Ekonometrika
No ratings yet
Output Uji Pair Wise Correlation - UAS Ekonometrika
10 pages
Business Analytics Module 4 Summary
No ratings yet
Business Analytics Module 4 Summary
3 pages

Linear Regression Notes

Uploaded by

Linear Regression Notes

Uploaded by

Linear regression Notes

What is a linear regression?

• Weight (kg) (Y Range - dependent variable)

Installing the Analysis ToolPak

Here are the instructions for installing the Analysis Toolpak:

Performing the linear regression in Excel

To perform the linear regression, click on the Data Analysis button.

Then, select Regression from the list.

You must then enter the following:

The final set of options concerns the residuals in the analysis.

Interpretation of the linear regression results

The results are generated into the following:

• Summary Output table

• Line Fits plot

Summary Output table

• Perfectly positive correlation: r=1

Correlation coefficient (r) Interpretation

0.10–0.39 Weak correlation

0.40–0.69 Moderate correlation

0.70–0.89 Strong correlation

0.90–1.00 Very strong correlation

R2 is an absolute value that is always between 0 and 1.

This is just the number of subjects in the test.

To be able to interpret this, we need our hypotheses:

The second row displays the results for the slope.

Normal Probability plot

Line Fits Plot

You might also like