0% found this document useful (0 votes)
6 views

Regression Analysis

Uploaded by

avirajput1230987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Regression Analysis

Uploaded by

avirajput1230987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Regression Analysis

Dr. Jai Kishun


Assistant Professor
Dept. of Biostatistics
& Health Informatics
SGPGIMS, Lucknow
Regression Analysis
Learning Objectives
1. Describe the Linear Regression Model
2. State the Regression Modeling Steps
3. Explain Regression Coefficients
4. Compute Regression Coefficients
5. Predict Response Variable
6. Interpret Computer results
Regression
Sir Galton was the first to apply the word regression to
biological and psychological data. Specifically, Galton
observed the heights of children versus the heights of their
fathers. He discovered that:
1. The average height of the sons of the group of tall
fathers is less than that of the fathers and the average
height of the sons of a group of short fathers is more
than that of the fathers.
2. The tall fathers have tall sons and short fathers have
short sons.
Regression analysis is a mathematical measure of the
average relationship between two or more variables in
terms of the original units of the data.
Regression Analysis

Regression Analysis is a very powerful tool


in the field of statistical analysis in
predicting the value of one variable, given
the value of another variable, when those
variables are related to each other.
Cont…

• Regression analysis is a statistical tool used


in prediction of value of unknown variable
from known variable.

• Regression analysis helps one understand


how the value of the dependent variable
(outcome variable) changes when values of
independent variable(s) are changed.
There are two types of variables which are used in
regression analysis.

Dependent variable: The variable we wish to explain


or predict

Independent variable: The variable used to explain or


predict to the dependent variable is called Independent
Variable. It is also known from other name as predictor,
explanatory variables, covariates, regressors, factors and
carriers
Regression analysis...............

Example : Suppose we want to predict the value of weight of


100 healthy persons while height and age of those healthy persons
are given. Now we can use regression analysis to predict the value of
weight ( Outcome or Dependent variable ) with the help of height
and age (Independent variable). This technique is called regression
analysis.
In regression analysis, first we develop a relationship ( regression
equation) between dependent and independent variable(s) using given
data. and in future when independent variable(s) will be given, then
with help of this relationship, we can predict the value of dependent
variable.
Steps in regression analysis
Regression analysis includes the following steps
• Statement of problem
• Selection of potentially relevant variables
• Data collection
• Model Specification
• Choice of fitting method
• Model Fitting
• Model Validation and criticism
• Using the chosen model(s) for the solution of
the posed problem.
Utility of regression analysis

• It helps in the formulation and determination of functional


relationship between two or more variables.
• It helps in establishing a cause and effect relationship
between two variables.
• It helps in predicting and estimating the value of
dependent variable from the values of independent variables .
• It helps to measure the variability or spread of values of a
dependent variable with respect to the regression line.
• Regression analysis also helps to obtain a measure of the error
involved in using the regression line as a basis for estimations.
Assumptions in Regression Analysis
• Existence of actual linear relationship.
• The regression analysis is used to estimate the
values within the range for which it is valid.
• The relationship between the dependent and
independent variables remains the same till the
regression equation is calculated.
• The dependent variable takes any random value
but the values of the independent variables are
fixed.
• In regression, we have only one dependant
variable in our estimating equation. However, we
can use more than one independent variable.
Regression Models

Regression
Models

Univariate Multivariable
One Explanatory Variable 2+ Explanatory Variables

Non- Non-
Linear Linear
Linear Linear
GRAPHS BEFORE FITTING A MODEL
• Model - should be based on theoretical background
or the hypothesis to be tested.
• Data may be used to suggest the model.
• Graphs may be used before fitting a model.
Four possible groups of graphs are:
1. One dimensional graphs
2. Two dimensional graphs
3. Rotating plots, and
4. Dynamic graphs
One dimensional Graphs
• Histogram
• Stem and leaf display
• Dot Plot
• Box Plot
Two Dimensional Graphs

The Draftsman’s plots or the plot matrix with the pair


wise correlation coefficients
What do expect each of the graphs in the plot
matrix to look like?
• In simple regression , the plot of Y versus X is
expected to show a linear pattern.
•In multiple regression, the scatter plots of Y
versus each predictor variable may or may not
show linear patters. The presence of a linear
pattern is reassuring about the fitting of model
whereas the absence of such a pattern does not
imply that our linear model is incorrect.
Univariate Linear
Regression Model
Relationship between variables as a linear function

Population Population Random


Y-Intercept Slope Error

Y = b0 + b1 x1 + e
Dependent or Independent or Expl.
Resp. Variable Variable
(e.g., income) (e.g., education)
• To fit the regression equation to the given data is
nothing but to estimate unknown constants
which are involved in regression line.

• According to principle of least square, we have to


determine the constants “” and such that the
sum of squares of the errors of estimates is
minimum. In other words, we have to minimize

• When we minimize this we will get two


equations.
Regression Equation / Line & Method of Least Squares
• Regression Equation of

In order to obtain the values of

• Regression Equation of

In order to obtain the values of

• Where no. of paired observations. These two equations


are called normal equations.
• Regression Equation of :
In order to obtain the values of and

• After putting the value of we get

• Regression Equation of :

• After putting the value of and we get


Regression Equation / Line when
Deviation taken from Arithmetic Mean
• Regression Equation of Y on X:

• Regression Equation of X on Y:
Interpretation of regression coefficients

• The constant ‘’ in known as regression coefficient .


• It is clear that will changed ‘’ units as unit change
in .
• If ‘’ is negative, increase in value of will
correspond to decrease in value of , it means
there is negative correlation between .
• On the other hand if ‘’ is positive, the increase in
the value of will be associated with increase in
value of , increase that there is positive
correlation between .
Data for Univariate linear regression analysis
Weight in kg Age in years
Sr.No.
( Dep_ ) ( Ind_ )
1 74 34
2 75 38
3 70 28
4 92 68
5 94 64
6 93 62
7 70 42
8 75 39
9 95 70
10 88 60
Univariate Linear Regression Model....Data Analysis in SPSS .....

In above, unstandardized coefficient β for age is 0.648, it means when we


increase one unit in age, weight is increasing 0.648 unit. i.e. if age is
increasing 10 years, corresponding weight is increasing 6.48 kgs.
Regression equation : , Where is weight, is age.
Suppose a person age is given, 30 years, that person predicted weight would be
Weight ()= constant+0.648* age + = 49.862+0.648 * 30 + .067
= 49.862 +19.44 + .067 = 69.369 kg
Regression line
• Regression line is the line which gives the best
estimate of one variable from the value of any
other given variable.

• The regression line gives the average relationship


between the two variables in mathematical form.

• The Regression would have the following


properties: a)and
b)
Regression line
• For two variables and , there are always two
lines of regression –

• Regression line of on : gives the best estimate


for the value of for any specific given values of

Where - intercept
Slope of the line
Dependent variable
Independent variable
Regression line
• Regression line of on : gives the best estimate
for the value of for any specific given values of
XWhere - intercept
Slope of the line
Dependent variable
Independent variable
• is also called the regression coefficient of on . It is also
denoted by . The above equation is called regression
equation on , it means we can write in term of .
Height father in cm Height son in cm
(Y) (X) y^2 X^2 YX
65 68 4225 4624 4420 a -3.37687
63 66 3969 4356 4158 b 1.036403
67 68 4489 4624 4556 X=-3.38+1.036Y
64 65 4096 4225 4160
68 69 4624 4761 4692
62 66 3844 4356 4092
70 68 4900 4624 4760
66 65 4356 4225 4290
68 71 4624 5041 4828
67 67 4489 4489 4489
69 68 4761 4624 4692
71 70 5041 4900 4970
Sum= 800 Sum= 811 53418 54849 54107
Mean= 66.667 Mean = 67.583
n=12
Linear Regression :Model
Y
? (the actual value of Yi)
Yi Y = b 0 + b1 X
ei

Xi X
Regression Line
How to draw a line through these points? and how
to determine the best fit line?

Y
60
40
20
0 X
0 20 40 60
Regression Line
How to draw a line through these points? and how
to determine the best fit line?

Y
60
40
20
0 X
0 20 40 60
Regression Line
How to draw a line through these points? and how
to determine the best fit line?

Y
60
40
20
0 X
0 20 40 60
Regression Line
How to draw a line through these points? and how
to determine the best fit line?

Y
60
40
20
0 X
0 20 40 60
Regression Line
How to draw a line through these points? and how
to determine the best fit line?

Y
60
40
20
0 X
0 20 40 60
Regression Line
How to draw a line through these points? and how
to determine the best fit line?

Y
60
40
20
0 X
0 20 40 60
Regression Line
How to draw a line through these points? and how
to determine the best fit line?

Y
60
40
20
0 X
0 20 40 60
Regression Lines
(A) Regression line of X on Y

X  X  bxy (Y  Y )
orX  X  bxyY  bxyY
bY
Where
Y
dx
X=
a + x SD( X )
bxy  r r
dx y SD(Y )
dx N  XY   X  Y

N  Y 2  ( Y ) 2
dx
 Re g . coeff. of X on Y
r  Corr. Coeff. X,Y
X
Gives better estimate of X Contd..
Regression Lines Contd..

(B) Regression line of y on x


Y  Y  byx ( X  X )
dy
a + bX orY  Y  byx X  byx X )
Y= Where
dy y SD(Y )
byx  r r
Y x SD( X )
dy
N  XY   X  Y
dy 
N  X 2  ( X ) 2
 Re g . coeff. of Y on X
r  Corr. Coeff. X,Y
X

Gives better estimate of y


Interpretation of Regression lines
Explanation of Regression Line
• In case of perfect correlation ( positive or
negative ) the two line of regression coincide.
• If the two regression line are far from each
other then degree of correlation is less, & vice
versa.
• The mean values of can be obtained as the
point of intersection of the two regression line.
• The higher degree of correlation between the
variables, the angle between the lines is
smaller & vice versa.
Multivariate Linear Regression Model
a) Dependent Variable/Outcome Variable (Single)
b) Independent Variable/Explanatory Variable or
Predictor Variable/Co-Variates (more than one).
Y   0  1 X 1   2 X 2   3 X 3     k X k  e
Depend Constan Erro
ent t Independent Predictors /Explanatory r
variables
0 : Constant Term
1 2 3 : Regression coefficients of independent predictors
e : Random Error Term
For a given data set on Y, X1, X2 … Xk the constant 0, 1, 2 …
k are estimated and tested for significance. Independent
variables with significant Coefficients are taken as possible
predictors of dependent variable (Y).
Data for Multivariate linear regression analysis
Age in Height Height in
Sr. Weight Sr. Weight in Age in
years in cms cms
in kgs kgs years
No. ( Ind_V No. ( Dep_V) ( Ind_V)
( Dep_V) ( Ind_V)
) ( Ind_V)
1 74 34 165 13 64 30 170
2 75 38 159 14 72 36 165
3 70 28 158 15 72 28 158
4 92 68 175 16 90 68 174
5 94 64 172 17 91 64 172
6 93 62 168 18 92 62 168
7 70 42 160 19 65 42 159
8 75 39 156 20 75 39 156
9 95 70 161 21 85 72 161
10 92 68 175 22 92 68 172
11 94 64 172 23 90 63 170
12 93 62 168 24 91 62 165
Multi-variate Linear Regression Model....Data
Analysis in SPSS .....

Model - Unstandardized coefficient for age is 0.589 & for


height is 0.0.172, it means when we increase one unit in age,
weight is increasing 0.589. When we increase 1 unit in height ,
weight is increasing 0.172 unit only.
Application of Multivariate Linear Regression Model
suppose age of a person is 30 years, height is given 160cms
what is the weight of the same person?
By using above regression equation :

Weight (Y) = β1X1 + β2X2 + where X1 is value of age, X2 is value of height

Weight (Y) = 23.47+0.589 * age + 0.172 * height + e

Weight (Y) = 23.47 + 0.589 * 30 + 0.172 * 160 +

= 68.66 kgs ( weight of the same person)


Properties of the Regression Coefficients
• The coefficient of correlation is geometric mean of the
two regression coefficients.
• If is positive than should also be positive & vice versa.
• If one regression coefficient is greater than one the other
must be less than one.
• The coefficient of correlation will have the same sign as
that our regression coefficient.
• Arithmetic mean of is equal to or greater than coefficient
of correlation.
• Regression coefficient are independent of origin but not
of scale.
Difference between correlation and regression
Correlation Regression
It studies the degree of It studies the nature of
relationship between relationship between variables
variables.
It need not imply cause and It implies cause and effect
effect relationship between relationship between variables
variables
There may be nonsense There is nothing like non
correlation between the sense regression
variables
The correlation coefficient is The regression coefficients are
independent of change of scale only independent of change of
and origin origin but not of scale
The correlation coefficient The regression lines can be
cannot be used for prediction used for prediction
Correlation analysis vs Regression analysis
• Regression is the average relationship between
two variables
• Correlation need not imply cause & effect
relationship between the variables under study.- R
A clearly indicate the cause and effect relation
ship between the variables.
• There may be non-sense correlation between two
variables.- There is no such thing like non-sense
regression.
Difference between Correlation and Regression
Correlation and regression are the most commonly used
techniques for investigating the relationship between
variables.

•Correlation makes no prior assumption as to whether


one variable is dependent on the other(s) and is not
concerned with the relationship between variables. It gives
only direction of linear relationship between two
variables.

•Regression attempts to describe the relationship between


one dependent variable with one or more independent
variables. it also shows that which variable(s) are
Difference between Correlation and Regression

•The goal of a correlation analysis is to see whether two


measurement variables co vary, whereas regression expresses
the relationship in the form of an equation.
•For example, a students taking a Maths and English test, we can
use correlation to determine ( through Quantitative value )
whether students who are good at Maths tend to be good at
English as well, or just opposite to each others.
But by regression, we can determine whether the marks in
English can be predicted through given marks in Maths.
Difference between Correlation and Regression
Through a Scatter Diagram (quantitative data):
The starting point is to draw a scatter of points on a graph, with
one variable on the X-axis and the other variable on the Y-axis,
to get a feel of the relationship (if any) between the variables as
suggested by the data. The closer the points are to a straight line,
the stronger the linear relationship between two variables. which
shows that strong correlation ( correlation) and X variable is
useful to predict y variable ( regression).

You might also like