0% found this document useful (0 votes)

81 views

Correlation and Regression Analysis

This document discusses correlation and regression analysis. It begins by defining univariate, bivariate, and multivariate distributions. It then defines correlation as measuring the relationship between two variables, noting that correlated variables change in the same or opposite directions together. Correlation can be positive, negative, linear, or non-linear. Methods for measuring correlation discussed include scatter diagrams and Karl Pearson's coefficient of correlation. The coefficient provides a single numerical measure of the linear relationship between two variables. Key assumptions and calculation methods are also outlined.

Uploaded by

himanshu.goel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views

Correlation and Regression Analysis

Uploaded by

himanshu.goel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 100

Correlation And Regression

Analysis
Introduction

Univariate distribution are the distributions where unit can take only one variable
value. In Bivariate distribution units can take two variable values and the
distribution where units can take more than two variable values are known as
Multivariate distributions.
In bivariate distributions we may be interested to know:
1.Any relationship between the variables under study.
2.The effect of one variable on other.
3.Their moment togetherness.

Correlation is a statistical tool which studies the relationship between the

variables and correlation analysis involves various methods and techniques used
for studying and measuring the extent of the relationship between two variables.

The variables are said to be correlated if the change in one variable results in a
corresponding change in other variable.
Study of correlation deals with the degree (strength) of mutual statistical relationship
between two or more variables. i.e., correlation studies the correspondence of movement
(going togetherness) between two variables or series of paired items.
For example :

1.If the price increases the demand decreases.

2.Cases of lung cancer may increase if the smoking habit increases.
3.Sale of woolen garments increases as the temperature decreases.
In the above said examples the two variables move together in same direction or in
opposite direction. But there are cases when two variables move independently and there
is no tendency of ‘going togetherness’ between them.

In correlation we do not deal with one series but rather with the association or
relationship between two series,

and we do not measure variation with one series but rather compare variation in two or
more series.

The two series may vary together in the same direction; or

They may vary together in opposite directions; or
They do not vary together at all
Definitions

Correlation has been defined in different ways as;

1.Correlation measures the closeness of relationship between two variables, more

exactly of the closeness of the linear relationship.
2.According to the words of Bodington; “Whenever some definite connection
exists between the two or more groups, classes or series or data there is said to
be a correlation”.
Importance and Utility of Correlation

1. The coefficient of correlation helps in measuring the extent of relationship

between two variables in one figure only.
2. Existence of relationship between two or more variables enables us to predict
what will happen in the future, e.g., if the production of wheat increases and
all the other factors are constant there may be a downfall in the price of
wheat.
3. If the two variables are closely related we can estimate the value of one
variable given the value of other variable.
4. Correlation facilitates decision-making in business organizations. Expectations
about the behavior of certain variables are also on correlation analysis.
Quadratic Correlation
Linear Correlation
Positive and Negative Correlation (Covariance)

If two variables move together in same direction, the correlation between them
is said to be Positive. If two variables move in opposite directions, the
correlation between them is said to be Negative. If they do not move together
at all there is No Correlation between them.

Example :
1.Since the price and demand move in opposite direction, the correlation
between them is negative.
2.Smoking habit and cases of lung cancer move in the same direction,
correlation between them is positive.
Linear and Non-Linear Correlation

If there is a proportionate change in the value of two variables the correlation is

known as Linear . If the change in the value of two variables is not
proportionate the correlation is known as Non – Linear.

Example :
1.The law of demand says other factor remaining constant, increase in price of
commodity is followed by a decrease in its demand, but we can not find any
proportionality relationship between them.
2. A proportionate change can be observed between consumption of
coffee and number of employees.
Example :
1.x 1 2 3 4 5 Linear Correlation
y 2 4 6 8 10

2. x 1 2 3 4 Non - Linear Correlation

y 3 5 8 15

Correlation Based on Number of Variables

When only two variables are involved and the relationship is studied between
those two variables the correlation is known as Simple Correlation. When
more than two variables are involved but the relationship is studied
between two variables only, keeping other variables as constant then the
correlation is known as Partial Correlation. But if more than two variables
are involved and the relationship is studied between all of them. then the
correlation is known as Multiple Correlation.
Some Important Points

1. There should be sufficient number of items in the series.

2. In correlation analysis we do not deal with one series only but the association
or relationship between two or more series.
3. We do not measure the variation in one series only rather we compare
variation in two or more series.
4. We study only Linear Correlation.
5. Correlation does not necessarily mean cause and effect relationship.
6. The sign of ‘r’ indicates the type of linear relationship whether positive or
negative.
Measure of Correlation

1. Scatter Diagrams.
2. Karl Pearson’s coefficient for measuring linear correlation.
3. Method of Rank Differences (Spearman’s Rank Correlation Coefficient).
Scatter Diagram :
Scatter diagram or dot diagram is a graphical representation of pair of numerical values of
the two variables. Each pair of values is represented by a dot on the graph. The scatter
of points and the direction of the scatter diagram revels the nature and degree of
correlation between two variables.
If all the points lie on a straight line having positive slope (i.e. rising line) the correlation
is said to be perfect positive. In this case coefficient of correlation ‘r = + 1’.
If all the points lie on the line having negative slope the correlation is known as perfect
negative. In this case coefficient of correlation ‘r = - 1.
In general if low values of one variables go with the low values of other variable and high
value of one variable goes with the high value of other variable, the path traced by
these points runs roughly from lower left to upper right corner, the relationship is
Direct and Positive.
And low values of one variables go with the high values of other variable, while high
value of one variable goes with the low values of other variable, the path traced by
these points roughly from upper corner to the lower right corner, relationship is
inverse and called negative.
Positive Correlation
Negative Correlation
Merits and Limitations of the Scatter – Diagram Method :
1. It is a non – mathematical and easy way to find the correlation between two variables.
2. By drawing a line of best fit by free hand method through the plotted dots, the method
can be used for estimating the missing value of the dependent variable for a given value
of independent variable.
3. The shape of scatter – diagram reveals whether the correlation is Linear or Non – linear
which enables us to know the pattern of relationship existing between two variables.
Scatter diagrams gives us an idea whether correlation is positive or negative.
4. The values of extreme observations do not affect the method.

Demerits :
It gives only rough idea how the two variables are related. The methods gives an idea about
the direction of correlation and also whether it is how or low. But this method does not
give any quantitative measure of the degree or the extend of correlation.
Karl Pearson Coefficient of Correlation

A mathematical method of measuring

the intensity or the magnitude of
linear relationship between two
variable series was suggested by
Karl Pearson (1867 – 1936), a great
British Bio – metrician and
Statistician.

Karl Pearson’s measure is known as

Pearson’s correlation coefficient
between two variables (series) X
and Y, usually denoted by ‘r (X, Y)’
or ‘rxy’ or simply ‘r’, is a numerical
measure of linear relationship
between them.
Assumptions of Karl Pearson’s Method

1. The variables X and Y under study are linearly related.

2. Each variable is affected by large number of independent
contributory causes of such a nature as to produce normal
distribution.
3. The forces so operating on each of the variable series are not
independent of each other but are related in casual fashion.
Calculation of Correlation Coefficient
For ungrouped data. Karl Pearson’s coefficient of correlation can be obtained by
using any of the following three methods :
(i) Actual Mean Method
(ii) Direct Method
(iii)Short – Cut Method

Actual Mean Method :

r
 X  X Y  Y 
n   x y

 X  X Y  Y 
 X  X   Y  Y  

2 2


 xy
 x . y
2 2

where, x  X  X
y  Y Y
Example:
From the following table calculate the Karl Pearson’s coefficient of correlation:

x 6 2 10 4 8
y 9 11 ? 8 7

Arithmetic mean of y is 8.

Solution:

y
 y  35  ?  8  ?  5
n 5

x
 x 30
 6
n 5
X Y x =X- 6 y = Y – 8 x2 y2 xy
6 9 0 1 0 1 0

2 11 -4 3 16 9 -12

10 5 4 -3 16 9 -12

4 8 -2 0 4 0 0

8 7 2 -1 4 1 -2

 x2 = 40  y2 = 20  xy = - 26

r
 xy 
 26
 0.92
x y
2 2
40  20
Direct Method :
In case mean values of the two series in a bivariate data are fractional values and number of
observations their volume in the two series is not very large, the following simplified form of
formula may be used for calculating the value of ‘r’.

 XY 
 X Y
r N N N
 X 2    X  Y 2    Y
2 2
 
 
N  N  N  N 
   
N  XY   X  Y 

N  X 2
  X 
2
 N  Y 2
  Y 
2

Short – Cut Method :
When mean values are fractional and the number of paired observations is large, and the
observations has large values, computing of ‘r’ can be simplified by using the deviations of
the of the observations from some suitably chosen constant or constants. The constants for
deviations of X and Y can be either same or different. The formula for computing correlation
coefficient based on deviations is as under :-
N  d x d y   d x  d y 
r 
N  d 2
x   d x 
2
 N  d 2
y   d y 
2

d x dy
 d d x y

 N N N
 d x2   d x   d y2   d y
2 2


N  N  N  N 
   
 d x d y    X  A  Y  B 
 N N N
 x y
d x dy 
 X 
NA 
 Y 
NB 

N  N N   N N 
   
 x y
d dy
   d   
x
 X  A Y B dy  N X  A Y  B
 N 
x

 x y N x y
Assumptions of Karl Pearson’s Coefficient

Karl Pearson’s coefficient of correlation as based on the following assumptions :-

(i)Linear Relationship :
In this method a linear relationship between two variables is assumed. In such case, the
paired observations on the two variables plotted on a scatter – diagram cluster around a
straight line.
(ii) Causal Relationship :
In studying correlation, we expect a cause and effect relationship between the forces
affecting the values in the two series.

Merits of Karl Pearson’s Coefficient of Correlation

1. It is important and popular method of measuring the relationship between two

variables. It gives a precise and quantitative value indicating the degree of relationship
existing between the two variables. The value of ‘r’ is easily interpretable.
2. It measures the direction as well as the relationship between the two variables.
Demerits of Karl Pearson’s Coefficient of Correlation

1. The value of the coefficient is affected by the extreme values.

2. Its computational procedure is difficult as compared to other methods.
3. It assumes the Linear Relationship between the two variables.

Example 1 :
Calculate the correlation coefficient between the height of father and height of son from the
given data :

Table: 1(Heights of Father’s and Son’s)

Height of Father (in inches) 64 65 66 67 68 69 70

Height of Son (in inches) 66 67 65 68 70 68 72
Table: 2(Calculation for ‘r’)

Height of Father Height of Son (X - Mean) (Y- x2 y2 xy

(X) (Y) X – 67 = x Mean)
Y – 68 =
y

64 66 -3 -2 9 4 6
65 67 -2 -1 4 1 2
66 65 -1 -3 1 9 3
67 68 0 0 0 0 0
68 70 1 2 1 4 2
69 68 2 0 4 0 0
70 72 3 4 9 16 12

X = 469 Y = 476 x2 = 28 y2 = 34 xy = 25

X 
 X  67
N

Y
 Y  68
N

Since the actual Means of X and Y are whole numbers, we can use actual mean method of
computing ‘r’.
 X  X Y Y  
 X  X  
r 
2

  Y Y
2


 xy
 x . y
2 2

25
  0.81
28  34
Case :
Table 3 shows the sales revenue and advertisement expenses of a company for past 10
months. Find the coefficient of correlation between the sales and advertisement.

Table 3: Sales and Advertisements for 10 months

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct
Ad (000 INR) 10 11 12 13 11 10 9 10 11 14
Sales (000 INR) 110 120 115 128 137 145 150 130 120 115

r= - 0.51
Case :

A Computer while calculating correlation coefficient between two variables X and Y from
25 pairs of observations obtained the following results:

n = 25, X = 125, X2 = 650, Y = 100, Y2 = 460, XY = 508

It was, however discovered at the time of checking that two pairs of observations were
interpreted by a computer bug wrong. They were taken as (6, 14) and (8, 6) while correct
values were (8, 12) and (6, 8). Prove that the correct value of correlation coefficient
should be 2/3.
Solution:
Calculate the coefficient of correlation from the following data:

Age of husband 23 27 28 29 30 31 33 35 36 39
Age of wife 18 22 23 24 25 26 28 29 30 32

r = 0.9956

Find Karl Pearson’s coefficient of correlation between sales and expenses of the following
ten firms:
Firm 1 2 3 4 5 6 7 8 9 10
Sales (000 units) 50 50 55 60 65 65 65 60 60 50
Expenses (000 INR) 11 13 14 16 16 15 15 14 13 13

r = 0.7866
X Y x = X – 31.1 y = Y – 25.7 x2 y2
xy
23 18 -8.1 -7.7 65.61 59.29 62.37
27 22 -4.1 -3.7 16.81 13.69 15.17
28 23 -3.1 -2.7 9.61 7.29 8.37
29 24 -2.1 -1.7 4.41 2.89 3.57
30 25 -1.1 -0.7 1.21 0.49 0.77
31 26 -0.1 0.3 0.01 0.09 -0.03
33 28 1.9 2.3 3.61 5.29 4.37
35 29 3.9 3.3 15.21 10.89 12.87
36 30 4.9 4.3 24.01 18.49 21.07
39 32 7.9 6.3 62.41 39.69 49.77
X Y x2  y2 = xy =
=311 =257 = 202.9 158.1 178.3
Calculation of Coefficient of Correlation for Grouped Data
Case:

Calculate the coefficient of correlation from the following data:

Marks in Finance
Marks in
Statistics 10 20 30 40 50 Total

5 2 4 1 4 1 12
10 8 2 5 1 × 16
15 × 3 2 1 × 6
20 × 1 3 2 4 10
25 × × 4 2 × 6
Total 10 10 15 10 5
X 10 20 30 40 50
dx -2 -1 0 +1 +2 f fdy fdy2 fdxdy
y dy

5 -2 2(+8) 4(+8) 1(0) 4(-8) 1(-4) 12 -24 48 +4

10 -1 8(+16) 2(+2) 5(0) 1(-1) × 16 -16 16 +17

×
15 0 × 3(0) 2(0) 1(0) 6 0 0 0

×
20 +1 1(-1) 3(0) 2(+2) 4(+8) 10 +10 10 +9

× × ×
25 +2 4(0) 2(+4) 6 +12 24 +4

Total f 10 10 15 10 5 50 -18 98 34
fdx -20 -10 0 +10 +10 -10
fdx2 40 10 0 10 20 80
fdxdy +24 +9 0 -3 +4 34
Probable Error

r ± P.E. (r) gives a range within which we can reasonably expect the value of
correlation to vary. It means if from same universe another sample is drawn
the coefficient of correlation for new sample would not fall outside these
limits.

P.E. (r) S.E.(r) is given by;

1 r 2
S .E.(r ) 
n
Probable error of the coefficient of correlation is given by;
1 r 2
P.E.(r )  0.6745  S .E.(r )  0.6745 
n
The reason behind taking the range 0.6745 is that 50% of the observations lie between 
 0.6745 , where  is the mean and  is standard deviation.
Coefficient of Determination (r2)

Coefficient of determination is treated as a better measure as it

tells the effect of independent variable on dependent variable.
For example if coefficient of correlation between advertisement
and sales is r = 0.80 then r2 = 0.64 explains that 64% of the
variation in sales can be explained by money spend on
advertisement.
Interpretation using Coefficient of Correlation

1. Whether the correlation is positive or negative

2. Coefficient of determination
3. Whether causality is there or not
x Mean  = 6 S.D.  = 2.5
2
3 Mean    will contain almost 65.8% values of the observations. For
the given observations almost 6 values
4
5 Mean    = [ 6 – 2.5, 6 + 2.5 ] = [ 3.5, 8.5 ] = [ 3, 4, 5, 6, 7, 8 ]
6
7
8 Mean   2 will contain almost 95% values of the observations. For
the given observations almost 9 values
9
10 Mean   2 = [ 6 – 5, 6 + 5 ] = [1, 11] = [ 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

Mean   3 will contain almost 99% values of the observations. For

the given observations almost 9 values
Mean   3 = [ 6 – 7.5, 6 + 7.5 ] = [- 1.5, 13.5]
Spearman’s Rank Correlation Coefficient

Rank Correlation Coefficient permits us to correlate two

sets of positive of qualitative observations which
are subject to ranking such as qualitative
productivity ratings (poor, fair, good, very good,
etc.) for a group of workers by two independent
observers. This will also give an idea whether the
two observers have common or different tastes
likings in a particular attribute or characteristics.
Ranks can be assigned either by two persons to a
single characteristics, say, beauty, honesty,
intelligence, etc., or by a single person or two
characteristics. When the marks are assigned by
two persons to a single characteristics, the
correlation is found between the opinion or tastes
of the two persons. High positive correlation
Charles Edward Spearman
indicates that the two persons have the same taste
in that characteristic. If two characteristics are 10 .09.1863 – 17.09.1945
judged by the same person, e.g., marks obtained in English psychologist
training and quantum of sales, then correlation is
found between two characteristics.
Steps to Calculate Spearman’s Rank Correlation Coefficient

To calculate the Rank Correlation Coefficient :

1. We first rank the two series say X’s and Y’s individually among themselves,
giving rank 1 to the largest (or smallest) value, rank 2 to the second largest
(second smallest) and so on in each series separately.
2. Find the differences ‘D’ of the corresponding Ranks of X and Y.
3. Sequence these differences and find the sum of the squares of these
differences, i.e., D2.
4. Calculate rank correlation coefficient by using the formula :

6 D 2
R  1

N N 12
 Where, ‘N’ denotes the number of paired values.
The above formula is applicable when no value in any of the two series is repeated.
(Repeated values are known as tied values and are given the same Rank). When there are
ties, we assign to each of the observations the mean of the ranks which they jointly
occupy.
For Example:
If the third and fourth largest values of a variable are the same, we assign to each values, the
rank = (3 + 4)/2 = 3.5 and if the fifth, sixth and seventh largest values of a variable are
the same, we assign to each rank = (5 + 6 + 7)/3 = 6.
When some of the values are repeated and average ranks are assigned, the following formula
is used to calculate rank correlation coefficient,
  m 3  m   mm 2  1
6  D   
2
 6  D 
2

  12   12 
R  1  1
N N  1
2
N N  1
2

Where m = number of times a particular value is repeated. Repetition of values can be one
series or both the series. Repetition can be in one value or more than one value.
Ex:
From following data, find out coefficient of rank correlation between price and supply.

Price 4 6 8 10 12 14 16 18
Supply 10 15 20 25 30 35 40 45

Solution :

Price Rank (R1) Supply Rank (R2) D = (R2 – R1) D2

4 8 10 8 0 0
6 7 15 7 0 0
8 6 20 6 0 0
10 5 25 5 0 0
12 4 30 4 0 0
14 3 35 3 0 0
16 2 40 2 0 0
18 1 45 1 0 0

6 D 2 0
R  1  1 1

N N 2 1   
8 82  1
Ex:
From following data, find out coefficient of rank correlation between price and supply.

x 50 33 40 10 15 15 65 24 15 57
y 12 12 24 6 15 4 20 9 6 18

Solution :

x Rank (R1) y Rank (R2) D = (R2 – R1) D2

50 3 12 5.5 2.5 6.25
33 5 12 5.5 0.5 0.25
40 4 24 1 + 3.0 9.00
10 10 6 8.5 + 1.5 2.25
15 8 15 4 + 4.0 16.00
15 8 4 10 2.0 4.00
65 1 20 2 1.0 1.00
24 6 9 7 1.0 1.00
15 8 6 8.5 0.5 0.25
57 2 18 3 1.0 1.00
Here in the first series, i.e., X series value 15 is repeated 3 times, in the Y series, the values
12 and 6 are each repeated twice.
 Rank correlation coefficient

mm 2  1
6 D   2

R  1 12
N N  1
2


6  D  
2    
m1 m1  1  m2 m2  1  m3 m3  1 
2 2 2


 
12
 1  
N N  1
2

 39  1  24  1  24  1

6 41  
 12
 1
10  100  1
 24  6  6 
6 41  
 12
 1
990
44  6
 1  0.733
990
X 57 16 24 65 16 16 9 40 48 33
y 19 6 9 20 4 15 6 24 13 13

R = 0.7333

S. No 1 2 3 4 5 6 7 8 9 10
X 12 18 32 18 25 24 25 40 38 22
y 16 15 28 16 24 22 28 36 34 19

R = 0.95

Marks in Statistics 30 38 28 27 28 23 30 33 28 35
Marks in Mathematics 29 27 22 29 20 29 18 21 27 22

R = - 0.3515
Twelve entries in painting competition were ranked by two judges as shown below:

Entry A B C D E F G H I J K L
Judge 1 5 2 3 4 1 6 8 7 10 9 12 11
Judge 2 4 5 2 1 6 7 10 9 11 12 3 8

What degree of agreement between two judges?

R = 0.46
Regression Analysis

The term Regression

means stepping back
towards the average.
It was first used by Sir
Francis Galton
(1822 – 1911), in
connection with the
inheritance of stature.
Regression Means is ‘Stepping Back’ or ‘Going Back’

The Experiment: Francis Galton (later half of 19th Century)

Av. H = 5’ 8’’ Av. H = 5’ 10’’

Av. H = 5’ 4’’ Av. H = 5’ 6’’
Av. H = 5’ 2’’

Son’s of Short Population Son’s of Tall

Fathers Average Fathers
Introduction:

Regression means stepping back towards the average. In statistics regression

analysis is applicable to all those fields where two or more related variables have
the tendency to go back to mean.

According to Blair “Regression is the measure of average relationship between

two or more variables in terms of original units of data.”

The chief objective of Regression analysis is to know the nature of relationship

between two variables and to use it for predicting the most likely value of the
dependent variable corresponding to a given known value of the independent
variable. However it may be noted that the regression relation is not reversible,
i.e.

The regression equation used to predict the value of Y from a given value of X
can not be used to predict the value of X from a given value of Y.

So, the regression relation is average, irreversible and functional relation.

Methods of Studying Regression:

Regression can be studied either:

(i)Graphically
(ii)Algebraically

In graphical method a scatter plot for the series must be prepared and two
regression lines are drawn for predicting the values of X and Y variables.
The regression lines that is used to predict the value of Y on the basis of X is
known as Y on X and the line which is used to predict the value of X for known
value of Y is known as X on Y.
In case of perfect correlation between X and Y (+1 or -1) there is only one
regression line. In other words, the two lines are identical.
Methods of Studying Regression:
Whenever a straight line is drawn to represent changes in dependent variable
with respect to independent variable, the regression is known as Linear
regression. If however the relationship between two variables can not be
represented through straight line the regression is known as Non-Linear
regression.

Regression lines can be drawn by one of the following methods:

1.Free hand curve method

2.Method of least squares
Methods of Studying Regression:

Free Hand Curve Method

- Plot the pair values of X and Y through scatter diagram
- Draw first regression line in such a way that positive deviations of all points
from axis of Y gets cancelled by negative deviations of all points from axis of
Y. This line is called Y on X.
- Draw second regression line in such a way that positive deviations of all
points from axis of X gets cancelled by negative deviations of all points from
axis of X. This line is called X on Y.
- The two regression lines cut each other at a point, that point is known as
mean point of two series.
Methods of Studying Regression:

Method of Least Squares

In order to avoid difficulties related to free hand curve drawing method, a
mathematical relationship is established between the movements of X and Y
series and the algebraic equations are obtained to represent the relative
movements of X and Y series.
The two normal equations that represented by:
Illustration:
Plot the regression lines associated with the following data:

X 1 2 3 4 5
Y 166 184 142 180 338
Why do we need two regression lines to find the value of two variables X and Y

Since the regression relation is irreversible, one equation is not sufficient to

predict the value of two variables X and Y. Moreover two regression equations
are derived under different sets of assumptions, therefore one equation is not
sufficient to find X and Y.
Methods of Studying Regression:

Method of Deviations From the Mean

As the method of least squares is tedious and involve a lot of calculation.
Method of deviations from Mean is developed to obtain regression lines.

 
Y  Y  bYX X  X .......(1)
and
 
X  X  bXY Y  Y ........( 2)
where ,

X  Mean of series X. Y
bYX  r  Regression coefficient of Y on X.
Y  Mean of series Y. X

bXY  r X  Regression coefficient of X on Y.
Y
Properties of Regression Coefficients:

1. Both the regression coefficients should be of same sign.

bYX  bXY   r 2
2. Correlation coefficient is the G.M. of two regression coefficients.

r bYX bXY 

3. Both regression coefficients can not be more than 1.

4. Regression coefficients denote the rate of change.

Find two regression lines from the following data

Sales 91 97 108 121 67 124 51 73 111 57

Purchase 71 75 69 97 70 91 39 61 80 47
Example:

A survey was conducted to study the relationship between expenditure on

accommodation (x) and expenditure on the entertainment (y) and the
following results were obtained:

Expenditure on Mean S.D.

Accommodation 173 66
Entertainment 47.8 22
Correlation coefficient 0.57

Estimate the expenditure on entertainment if the expenditure on

accommodation is 200.
Solution:

Here,
X  173
Y  47.58
 x  66  y  22 r  0.57

x 22
byx  r  0.57   0.19
y 66

Y  Y  bYX X  X 
Y  14.71  0.19 X

for X  200

Y  14.71  0.19  200  52.71

The Irreversible Relation:

1. The increment in family income shows an increment in expenditure but the

increment in the expense of the family does not show the increment in family
income.

2. If the rainfall is timely and good the crop will be good but if the crop is good
there is not guarantee that the rainfall is timely and good.
Difference between Correlation and Regression:

Correlation Regression

It is merely concerned with determining

how strongly the two variables are It precedes correlation.
linearly related.

Not able to solve the prediction It solves the prediction problems.

problems.

Coefficient of correlation is
Coefficient of regression is independent
independent of the change of the origin
of the change of origin only.
and scale.

Coefficient of correlation satisfies the Coefficient of regression satisfies the

relation – 1 ≤ r ≤ + 1. relation 0 ≤ r2 ≤ 1.
Regression Lines:

“The device used for estimating the value of one variable from the value of other
consist of a line through the points drawn in such a manner as to represent the
average relationship between the two variables. Such a line is called the line of
regression”.

J R Stockton
As per the method of least squares, two regression lines are:

 
Y  Y  bYX X  X .......(1)
and
 
X  X  bXY Y  Y ........( 2)
where ,

X  Mean of series X.
Y  Mean of series Y.
Y
bYX  r  Regression coefficient of Y on X.
X
X
bXY r  Regression coefficient of X on Y.
Y
Properties of Regression Lines:

1.The A.M. of X and A.M. of Y lies on the regression lines.

2.If r = 0, two regression lines are perpendicular to each other.

3.If two regression lines are identical, the correlation between the variables is
perfect.

4.Angles between the regression lines can be given by

  XY 1 r 2 
tan    2 
 r 

 X   Y
2
 
Meaning of Regression

•Meaning of Regression is an act of returning or going back.

•It was first used in 1877 by “Sir Fransis Galton”. While studying the
relationship between the height of father and sons.

•The statistical tool with the help of which we are in a position to

estimate(predict) the unknown values of one variable from known
values of another variable is called Regression.
Significance of Regression Analysis

It can be expressed under following heads:

1. The relationship of cause and effect between two or more variables can be
analyzed with the help of regression analysis.

2. The change in the value of one variable can be determined from regression
coefficient if there is change of a unit in the value of other variable.

3. It provides estimates of values of the dependent variable on the basis of values

of the independent variable in the areas of social, economic and business
activities.

4. In the field of business, regression is very useful because with the help of it a
businessman can predicting future production, consumption, investment,
prices, profits, sales, etc.
Types of Regression

1. Simple regression :

If regression analysis is based only on two variable, is called simple regression. A

simple regression is one which is confined to only two variables say, X and Y. Here
‘X’ is a independent variable and ‘Y’ is a dependent variable. The functional
relationship between X and Y;
i.e. , Y= f(X)

2. Multiple regression:

If more than two variables are studied at a time in regression analysis, it is called
multiple regression. A multiple regression analysis is one which is made among
more than two related variables at a time say X,Y and Z. The functional
relationship in such case is expressed as under;

Y=f(X,Z), or X=f(Y,Z), or Z=f(X,Y).

3. Linear regression: If the regression line is in the form of a straight line, it
indicates linear regression between the variables under the study in case of linear
regression the values of the dependent variable changes at a constant rate for a unit
change in the value of the independent variable. This constant change may be in
terms of absolute amount or percentage.

4. Curvi-linear or non-linear regression: If the regression line is not a

straight line but a smoothed curve, regression is termed as curvi-linear or non-
linear.
Regression Equations
X on Y Y on X
• This equation describes the • This equation describes the
variation in the values of X for variation in the values of Y for the
the given changes in Y. given changes in X.
• It estimates the value of X for the • It estimates the value of Y for the
given value of Y. given value of X. (Y  Y )  r   ( X  X )
X  X   r  
x
(Y  Y )
y

x
y

X=Value of X variable to be predicted

Y=Value of Y variable to be predicted
=Arithmetic Mean of X series
X =Arithmetic Mean of X series
Xr=Correlation
series
Coefficient of X and Y
r=Correlation Coefficient of XandY
=Standard deviation of X series series
= Standard deviation of Y series
 x=Standard deviation of X series
Y=That y
x value of Y variable, = Standard deviation of Yseries
 y corresponding to which the vale
of X variable is to be predicted Y=That value of X variable,
corresponding to which the vale of
=Arithmetic Mean of Y series Y variable is to be predicted
Y =Arithmetic Mean of Y series
Y
EXAMPLE 1
• The following information are given to you
Husband Age Wife Age
MEAN 25 years 22 years
STANDARD DEVIATION 4 years 5 years

• Coefficient of correlation between ages of

husband and wives =+0.8
• Find the expected age of husband when wives
age is 12 years &expected age of wife when
husband age is 33 years.
Given that: X  25, Y  22,  x  4,  y  5, r  0.8

Regression equation of X on Y Regression equation of Y on X

 
X  X  r
x
y
(Y  Y ) Y  Y   r   X  X 
y

X  25  0.8  4 Y  22 Y  22  0.8  5 X  25

4
5 Y  22  1.0   X  25
X  25  0.64(Y  22) Y  X  25  22
X  0.64Y  14.08  25 Y  X 3
X  0.64Y  10.92
if the age of wife is 12 then If the age of husband is 33 years
the age of husband is :- then the age of wife is :-
X=0.64*12+10.92 Y= 33-3
=7.68+10.92 Wife age =30 Years
=18.60 years
REGRESSION COEFFICIENTS

This coefficient indicates that if there is a unit

change in the value of one variable, than what
will be the average change in the value of
other variable. Since there are two regression
equation, therefore, there are two regression
coefficient-regression coefficient of X on Y
and regression coefficient of Y on X.
• Regression coefficient of X on • Regression coefficient of Y on
Y (bxy) X (byx)
• This coefficient represents the • This coefficient represents the
change in the value of X for a unit change in the value of Y for a unit
change in the value of the change in the value of variable X
variable Y • When X and Y series are given
• When X and Y series are given and deviations have been taken
and deviations have been taken from assumed mean in one or
from assumed mean in one or in both series
both series

 dxdy  N   dx   dy   dxdy  N   dx   dy 
bxy  byx 
 d y  N   dy   d x  N   dx 
2 2 2
2
Example 2

From the following data calculate:

(a) the two regression coefficients
(b) the two regression equation

Population(in thousands) 18 19 20 21 22 23 24 25 26 27

No.of TV sets demanded 14 16 16 18 18 19 20 20 21 21

X dx from 23 2 Y dy from d2y dxdy
d x 18
18 -5 25 14 -4 16 20

19 -4 16 16 -2 4 8

20 -3 9 16 -2 4 6

21 -2 4 18 0 0 0

22 -1 1 18=A 0 0 0

23=A 0 0 19 1 1 0

24 1 1 20 2 4 2

25 2 4 20 2 4 4

26 3 9 21 3 9 9

27 4 16 21 3 9 12

 dy  3 d y  51  dxdy  61
2

 X  225  dx  5  d 2 x  85  Y  183
REGRESSION COEFFICIENT

X on Y Y on X

 dxdy  N   dx   dy 
bxy   dxdy  N  ( dx   dy )
 d y  N  ( dy)
2 2
byx 
 d x  N  ( dx)
2 2

61  10  (5  3) 6110  (5  3)

 
51  10  (3) 2 85 10  (5) 2
610  15 610  15

510  9 
850  25
625
 625
501 
 1.247
825
 .758
(b) Regression Equations
X on Y Y on X
Regression Equation of X on Y:This Regression Equations of Y on X:
equation describes the variation This equation describes the variation in
in the values of X for the given the
changes in Y. Values of Y for the given changes in X
_ _ _ _

X  X  bxy (Y  Y ) Y  Y  byx( X  X )
X  22.5  1.247(Y  18.3) Y  18.3  0.758( X  22.5)
X  22.5  1.247Y  22.82 Y  18.3  0.758 X  17.055
X  1.247Y  22.82  22.5 Y  0.758 X  17.055  18.3
X  1.247Y  0.32 Y  0.758 X  1.245
REGRESSION LINES
• Regression lines are the lines of best fit
expressing mutual average relationship
between two series. These lines give the best
estimate of one variable for any given value of
other variable.

• If we take the case of two variable X and Y we

shall have two regression line as X on Y and Y
on X .
• Obtain the regression equation of Y on X and
X on Y from the following table giving the Sale
of goods “X” and goods “Y”.

Sale of goods
Sale of goods “x”
“Y”
(in UNITS) 5-15 15-25 25-35 35-45 total
0-10 1 1 - - 2
10-20 3 6 5 1 15
20-30 1 8 9 2 20
30-40 - 3 9 3 15
40-50 - - 4 4 8
Total 5 18 27 10 60
X 5-15 15-25 25-35 35-45 f fd y fd y2 fd x d y
M.P. 10 20 30 40
-1 0 1 2
dx

M.P.

2 0

3 0 -5 -2
0-10 5 -2 1 1 _ _ 2 -4 8 2
10-20 15 -1 3 0 6 0 5 0 1 0 15 -15 15 -4
20-30 25 0 1 8 9 2 20 0 0 0
30-40 35 1 3 0 9 9 3 6 15 15 15 15
-
40-50 45 2 4 4 8 16 32 24
- 8 16
f 5 18 27 10 N=60
 fd=12y  fd y
2
=70  fd= 37d
x y
-5 0 27 20

fd x
5 0 27 40
 fd =42
x

fd x2 5 0 12 20
 fd 2
x  72
fd x d y  fd d x y  37
Regression Equation
• Y on X
x
Y Y  r
y
X  X 
x N  fdxdy   fdx  fdy  iy
r  
y N  fd x2   fd x 
2
ix
6037   42 12  10 2220  504 1716
     0.67
6072   42  4320  1764
2
10 2556

Y  A
 fd y
i
N
12
Y  25   10  27
60

X  A
 fd x  i
N
42
X  20   10  27
60
Y  27  0.67 X  27   0.67 X  18.09
Y  27  0.67 X  18.09
Y  8.91  .67 X
– X on Y
x
X  X  r
y
Y Y

x N  fd x d y   fd x  fd y i x
r  
y N  fd y   fd y 
2
2 iy
6037   42  12 2220  504 1716
   0.423
6070   12  4200  144
2
4056
X  27  0.423Y  27   0.423Y  11 .42
X  15.58  0.423Y
REFERENCES
• BUSINESS STATISTICS BY: S.P.GUPTA &
M.P.GUPTA

• PRINCIPLE OF STATISTICS BY :
Dr. S.M.SHUKLA & Dr S. P. SAHAI
• FUNDAMENTAL OF STATISTICS BY:

B.M.AGARWAL
THANK YOU

Instant Download Regression Analysis An Intuitive Guide For Using and Interpreting Linear Models 1st Edition Jim Frost PDF All Chapter
0% (1)
Instant Download Regression Analysis An Intuitive Guide For Using and Interpreting Linear Models 1st Edition Jim Frost PDF All Chapter
62 pages
QT Chapter 4
No ratings yet
QT Chapter 4
6 pages
Syllabus (Housekeeping Operation)
No ratings yet
Syllabus (Housekeeping Operation)
14 pages
Applied Multivariate Statistical Analysis Solution Manual PDF
No ratings yet
Applied Multivariate Statistical Analysis Solution Manual PDF
18 pages
NZS 3101-2006 Example 002
100% (1)
NZS 3101-2006 Example 002
4 pages
Term 1 - Paper 1 Grade 9
No ratings yet
Term 1 - Paper 1 Grade 9
7 pages
Statistics
No ratings yet
Statistics
10 pages
Quadratic Forms and Characteristic Roots Prof. NasserF1
No ratings yet
Quadratic Forms and Characteristic Roots Prof. NasserF1
65 pages
Quiz No. 3
No ratings yet
Quiz No. 3
5 pages
Identifying The Research Problem: A/Professor Denis Mclaughlin School of Educational Leadership
No ratings yet
Identifying The Research Problem: A/Professor Denis Mclaughlin School of Educational Leadership
13 pages
Multivariate Statistical Functions in R
100% (3)
Multivariate Statistical Functions in R
382 pages
6 Stat. Corelation Regression
No ratings yet
6 Stat. Corelation Regression
19 pages
Making Sense of Statistics
No ratings yet
Making Sense of Statistics
16 pages
Confirmatory Factor Analysis: Intro
No ratings yet
Confirmatory Factor Analysis: Intro
14 pages
Excel How To Make Graphs
100% (1)
Excel How To Make Graphs
34 pages
Bootstrap PDF
No ratings yet
Bootstrap PDF
24 pages
Corelation and Regression
No ratings yet
Corelation and Regression
5 pages
Complete Nonparametric Statistical Methods Using R 1st Edition John Kloke PDF For All Chapters
100% (13)
Complete Nonparametric Statistical Methods Using R 1st Edition John Kloke PDF For All Chapters
70 pages
Structural Equation Modeling
No ratings yet
Structural Equation Modeling
23 pages
Canonical Correlation Notes
No ratings yet
Canonical Correlation Notes
6 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
Econometric Models
No ratings yet
Econometric Models
2 pages
Graph Theory
No ratings yet
Graph Theory
22 pages
STAT 650 - Foundations of Data Science Syllabus
No ratings yet
STAT 650 - Foundations of Data Science Syllabus
13 pages
Chapter 5 Descriptive Statistics in SPSS
No ratings yet
Chapter 5 Descriptive Statistics in SPSS
35 pages
Tutorial 6 Graphing Techniques Solutions
No ratings yet
Tutorial 6 Graphing Techniques Solutions
22 pages
Factor Analysis: N P Singh Professor
No ratings yet
Factor Analysis: N P Singh Professor
52 pages
Brief Introduction To Neural Networks
No ratings yet
Brief Introduction To Neural Networks
244 pages
Correlation, Correlational Studies, and Its Methods: Mariah Zeah T. Inosanto, RPM
No ratings yet
Correlation, Correlational Studies, and Its Methods: Mariah Zeah T. Inosanto, RPM
39 pages
Simple Linear Regression
100% (3)
Simple Linear Regression
13 pages
Structural Equation Modeling
No ratings yet
Structural Equation Modeling
42 pages
CMSC 56 Course Outline
No ratings yet
CMSC 56 Course Outline
17 pages
Cost
No ratings yet
Cost
27 pages
Statistics
No ratings yet
Statistics
41 pages
Regression
No ratings yet
Regression
25 pages
Unit 2
No ratings yet
Unit 2
38 pages
Exam Questions
No ratings yet
Exam Questions
3 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
5 pages
Correlation & Regression
No ratings yet
Correlation & Regression
20 pages
Statistics For Data Science And Analytics Peter C Bruce download
No ratings yet
Statistics For Data Science And Analytics Peter C Bruce download
80 pages
Vectors PDF
No ratings yet
Vectors PDF
5 pages
Autocorrelation
No ratings yet
Autocorrelation
36 pages
010 Introduction To Statistics
No ratings yet
010 Introduction To Statistics
12 pages
Chapter Four - Solutions of A System of Linear Equations
No ratings yet
Chapter Four - Solutions of A System of Linear Equations
62 pages
RATS Programming Manual W Enders
No ratings yet
RATS Programming Manual W Enders
256 pages
Midterm Lesson 4 PDF
No ratings yet
Midterm Lesson 4 PDF
28 pages
03 Numeric Basics
No ratings yet
03 Numeric Basics
68 pages
CH 14
No ratings yet
CH 14
146 pages
Math
No ratings yet
Math
36 pages
Module9-Correlation and Regression (Business)
No ratings yet
Module9-Correlation and Regression (Business)
15 pages
Confirmatory Factor Analysis Presentation
No ratings yet
Confirmatory Factor Analysis Presentation
11 pages
Get Basic Business Statistics 13th Edition (eBook PDF) free all chapters
100% (3)
Get Basic Business Statistics 13th Edition (eBook PDF) free all chapters
56 pages
Regression Formula
No ratings yet
Regression Formula
2 pages
Bahan Univariate Linear Regression
No ratings yet
Bahan Univariate Linear Regression
64 pages
R Studio How To
No ratings yet
R Studio How To
12 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
MMW (Data Management) - Part 1
No ratings yet
MMW (Data Management) - Part 1
26 pages
Correlation: (For M.B.A. I Semester)
100% (2)
Correlation: (For M.B.A. I Semester)
46 pages
Correlation 805deee567bf3bca405e2e973070a021
No ratings yet
Correlation 805deee567bf3bca405e2e973070a021
18 pages
Online Class Etiquettes and Precautions For The Students
No ratings yet
Online Class Etiquettes and Precautions For The Students
49 pages
Correlation
No ratings yet
Correlation
7 pages
Peter
No ratings yet
Peter
48 pages
Averages
No ratings yet
Averages
75 pages
Classification & Presentation
No ratings yet
Classification & Presentation
34 pages
Data Collection
No ratings yet
Data Collection
25 pages
Case Study Analysis
No ratings yet
Case Study Analysis
19 pages
Midterm Module 6
No ratings yet
Midterm Module 6
12 pages
RSU 21 Statement
No ratings yet
RSU 21 Statement
2 pages
Junior Training Sheet - Template - V5.7
No ratings yet
Junior Training Sheet - Template - V5.7
38 pages
Unit 6 Language Focus - Answer Key
No ratings yet
Unit 6 Language Focus - Answer Key
2 pages
Avillion Farms Commercial E-Brochure
No ratings yet
Avillion Farms Commercial E-Brochure
36 pages
Sample Lab Report
No ratings yet
Sample Lab Report
3 pages
Level 3 - Quiz 3
No ratings yet
Level 3 - Quiz 3
4 pages
Quiz 2 Key
No ratings yet
Quiz 2 Key
38 pages
Production Management
100% (1)
Production Management
435 pages
Physics 2: Multiple Choice
No ratings yet
Physics 2: Multiple Choice
6 pages
Student Instructions For The Use of Spreadsheets With Examples PDF
No ratings yet
Student Instructions For The Use of Spreadsheets With Examples PDF
67 pages
8086 and Memory Interfacing
No ratings yet
8086 and Memory Interfacing
11 pages
Navy and Broken White Geometric Thesis Defense Presentation
No ratings yet
Navy and Broken White Geometric Thesis Defense Presentation
40 pages
Berryman 2002_Population - a central concept for ecology
No ratings yet
Berryman 2002_Population - a central concept for ecology
5 pages
Induction Training and Induction Checklist
No ratings yet
Induction Training and Induction Checklist
11 pages
史都華平台之仿生物演算法模糊強化學習控制與FPGA實現
No ratings yet
史都華平台之仿生物演算法模糊強化學習控制與FPGA實現
99 pages
Quantitative Research Methods
100% (1)
Quantitative Research Methods
23 pages
Science
No ratings yet
Science
20 pages
Elastic Structures
No ratings yet
Elastic Structures
9 pages
The Ecstasy of Communication - Jean Baudrillard - 1983 - Anna's Archive
No ratings yet
The Ecstasy of Communication - Jean Baudrillard - 1983 - Anna's Archive
10 pages
02 Alcoa
No ratings yet
02 Alcoa
27 pages
Dallas Overseas Profile 2015
No ratings yet
Dallas Overseas Profile 2015
50 pages
Robert Nozick - Jonathan Wolff
100% (1)
Robert Nozick - Jonathan Wolff
11 pages
C1 reading 1 - web difficult
No ratings yet
C1 reading 1 - web difficult
2 pages
Globe Life - AIL - Spotlight Magazine - 2024-02
No ratings yet
Globe Life - AIL - Spotlight Magazine - 2024-02
37 pages
Markov Chains (Part 3) : State Classification
No ratings yet
Markov Chains (Part 3) : State Classification
19 pages
Untitled
No ratings yet
Untitled
26 pages