0% found this document useful (0 votes)
72 views

Bivariate Analysis

Bivariate data involves two variables that are recorded simultaneously from individuals in a group. Examples include heights and weights of students, age and blood pressure of individuals, and income and expenditure of families. Correlation refers to the association or independence between two variables. Positive correlation means a variable increases on average as the other increases, while negative correlation means a variable decreases as the other increases. Scatter diagrams graphically display the relationship between two variables with points plotted on a coordinate plane. Pearson's product-moment correlation coefficient, denoted by r, measures the strength and direction of linear relationships between variables on a scale from -1 to 1.

Uploaded by

Madara Uchiha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Bivariate Analysis

Bivariate data involves two variables that are recorded simultaneously from individuals in a group. Examples include heights and weights of students, age and blood pressure of individuals, and income and expenditure of families. Correlation refers to the association or independence between two variables. Positive correlation means a variable increases on average as the other increases, while negative correlation means a variable decreases as the other increases. Scatter diagrams graphically display the relationship between two variables with points plotted on a coordinate plane. Pearson's product-moment correlation coefficient, denoted by r, measures the strength and direction of linear relationships between variables on a scale from -1 to 1.

Uploaded by

Madara Uchiha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2

Δ
BIVARIATE - ANALYSIS

Bivariate data: Data on the 2 variable recorded simultaneously from the group of individuals are called
bivariate data.

Example: i) Heights (x) and Weights (y) of the students.

ii) Age (x) and Blood pressure (y) of a group of individuals.

iii) Marks in test (x) Marks in final exam (y) of students.

iv) income and expenditure of a number of family etc.

What do you mean by Correalation:

By co-relation we mean association or independence b/w two variables.

If two variables are so related that a change in the magnitude of one variable is accompanied by the change
of other variable then the two variables said to be co-related or associated.

Positive Correlation:

If a variable is found to increase (decrease) on an average with the increase (decrease) of


other variables them the variable said to be Positive Correlated.

Negetive Correlation:

If a variable is found to decrease (increase) on an average with the increase (decrease) of


other variable then the variable said to be Negative Correlated.

𝑦1 − 𝑦1′ 𝑦2 − 𝑦2′ .......... 𝑦𝑙 − 𝑦𝑙′ Total


𝑥1 − 𝑥1′ 𝑓11 𝑓12 𝑓1𝑗 𝑓1𝑙 𝑓10
𝑥2 − 𝑥2′ 𝑓21 𝑓22 𝑓2𝑗 𝑓2𝑙 𝑓20
............ ...................... ....................... 𝑓𝑖𝑗 ....................... 𝑓𝑖0
𝑥𝑘 − 𝑥𝑘′ 𝑓𝑘1 𝑓𝑘2 𝑓𝑘𝑗 𝑓𝑘𝑙 𝑓𝑘0
Total 𝑓01 𝑓02 𝑓0𝑗 𝑓0𝑙 𝑛

𝑘 𝑙 𝑘 𝑙

𝑛 = ∑ ∑ 𝑓𝑖𝑗 = ∑ 𝑓𝑖0 = ∑ 𝑓0𝑗 = 𝑛


𝑖=1 𝑗=1 𝑖=1 𝑗=1
Conditional distribution of x given y belongs to the 𝒋𝒕𝒉 class:

Value of x frequency
𝑥1 − 𝑥1′ 𝑓1𝑗
𝑥2 − 𝑥2′ 𝑓2𝑗
................. 𝑓𝑖𝑗
𝑥𝑘 − 𝑥𝑘′ 𝑓𝑘𝑗
Total 𝑓0𝑗

Marginal distribution of y :

Value of y frequency
𝑦1 − 𝑦1′ 𝑓01
𝑦2 − 𝑦2′ 𝑓02
............... 𝑓0𝑗
𝑦𝑙 − 𝑦𝑙′ 𝑓0𝑙
Total 𝑛

Scatter Diagram: Scatter diagram is one of the diagrammatic representations of bivariate data. It’s a
diagram of points obtained by plotting each pair of observation as a point {(𝑥𝑖 , 𝑦𝑗 ) ∀𝑖 } on a graph taken 2
mutually perpendicular Co-ordinate.

Scatterplot of y vs x Scatterplot of y vs x

100 100

80 80

60 60
y
y

40 40

20 20

0 0
0 10 20 30 40 50 0 10 20 30 40 50
x x
i) ii)

Scatterplot of y vs x Scatterplot of y vs x
140000
90

120000 80

70
100000
60
80000
50
y

60000 40

30
40000
20
20000
10

0 0

0 10 20 30 40 50 0 10 20 30 40 50
x x
iii) iv)

Pearson’s product moment correlation coefficient:

1 𝑛 ′ ′ 1
∑𝑖=1 𝑥𝑖 𝑦𝑖 ( ∑𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦))
𝑟= 𝑛 = 𝑛 , 𝑤ℎ𝑒𝑟𝑒 𝑥𝑖′ = (𝑥𝑖 − 𝑥) , 𝑦𝑖′ = (𝑦𝑖 − 𝑦)
𝑠𝑥 𝑠𝑦 𝑠𝑥 𝑠𝑦
𝑟 > 0 ; (𝑆𝑡𝑟𝑜𝑛𝑔𝑒𝑟)
{ 𝑟 < 0 ; (𝑊𝑒𝑎𝑘𝑒𝑟)
𝑟 = 0 ; (𝑛𝑜 𝑙𝑖𝑛𝑒𝑎𝑟 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛)

Working Formula for 𝒓:

1
( ∑𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)) 𝑐𝑜𝑣(𝑥, 𝑦) 1 𝑛
𝑟= 𝑛 = ; 𝑐𝑜𝑣(𝑥, 𝑦) = ( ∑ (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦))
𝑠𝑥 𝑠𝑦 𝑠𝑥 𝑠𝑦 𝑛 𝑖=1

1 𝑛 1 𝑛
∑ 𝑥𝑖 𝑦𝑖 − 𝑥y − 𝑥y + 𝑥y ⇒ ∑ 𝑥𝑖 𝑦𝑖 − 𝑥y ⇒ 𝑐𝑜𝑣(𝑥, 𝑦)
𝑛 𝑖=1 𝑛 𝑖=1

𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖
=𝑟
𝑠𝑥 𝑠𝑦

Some results on covariants:

i) 𝐶𝑜𝑣(𝑎, 𝑏) = 0
ii) 𝐶𝑜𝑣(𝑎𝑥 + 𝑏 , 𝑐) = 0
iii) 𝐶𝑜𝑣(𝑎𝑥 + 𝑏 , 𝑐𝑦 + 𝑑) = 𝑎𝑐 𝐶𝑜𝑣(𝑥, 𝑦)
iv) 𝐶𝑜𝑣(𝑎𝑥 + 𝑏𝑦 + 𝑘 , 𝑐𝑥 + 𝑑𝑦 + 𝑙) = 𝑎𝑐 𝑉𝑎𝑟(𝑥) + 𝑏𝑑 𝑉𝑎𝑟(𝑦) + (𝑎𝑑 + 𝑏𝑐)𝐶𝑜𝑣(𝑥, 𝑦)

Proof:

i) 𝑥𝑖 = 𝑎 , 𝑦𝑖 = 𝑏 , ∀𝑖 ; ∴ 𝑥 = 𝑎 , 𝑦 = 𝑏
1 1
𝐶𝑜𝑣(𝑎, 𝑏) = ∑(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) ⇒ × 0 = 0
𝑛 𝑛

ii) Do it yourself
iii) 𝑝𝑖 = 𝑎𝑥𝑖 + 𝑏 , 𝑞𝑖 = 𝑐𝑦𝑖 + 𝑑 , ∀𝑖
1
𝐶𝑜𝑣(𝑝𝑖 , 𝑞𝑖 ) = ∑ 𝑎(𝑥𝑖 − 𝑥)𝑐(𝑦𝑖 − 𝑦) = 𝑎𝑐 𝐶𝑜𝑣(𝑥𝑖 , 𝑦𝑖 )
𝑛
iv) Do it your self

Some results on variance:

i) 𝑉(𝑎𝑥 + 𝑏𝑦 + 𝑐) = 𝑎2 𝑉(𝑥) + 𝑏 2 𝑉(𝑦) + 2𝑎𝑏 𝑐𝑜𝑣(𝑥, 𝑦)


ii) 𝑉(∑𝑘𝑖=1 𝑥𝑖 ) = ∑𝑘𝑖=1 𝑉(𝑥𝑖 ) + 2 ∑𝑘𝑖<𝑗 ∑ 𝑐𝑜𝑣 (𝑥𝑖 , 𝑥𝑗 )

Proof :

𝑎𝑥 + 𝑏𝑦 + 𝑐 = 𝑢 ⇒ 𝑎𝑥 + 𝑏𝑦 + 𝑐 = 𝑢

∴ 𝑎(𝑥𝑖 − 𝑥) + 𝑏(𝑦𝑖 − 𝑦) = (𝑢𝑖 − 𝑢)

1
𝑉(𝑎𝑥 + 𝑏𝑦 + 𝑐) = 𝑉(𝑢) = ∑{𝑎2 (𝑥𝑖 − 𝑥)2 + 𝑏 2 (𝑦𝑖 − 𝑦)2 + 2𝑎𝑏(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)}
𝑛
𝑎2 𝑉(𝑥) + 𝑏 2 𝑉(𝑦) + 2𝑎𝑏 𝑐𝑜𝑣(𝑥, 𝑦)

Properties of correlation coefficient:

i) Its a pure number or unit free number.


ii) 𝑟𝑥𝑦 = 𝑟𝑦𝑥
iii) Numerical value of the correlation coefficient independent of change of the origin and scale.
(correlation coefficient is independent of change of origin but depend upon change of scale)
iv) −1 ≤ 𝑟 ≤ 1
v) Correlation coefficient is not a resistance measure. Its highly effected by outliers.
vi) Correlation coefficient = 0 ⇏ doesn’t necessarily imply variables are independent there may
exist some non linear relationship.
vii) Existence of correlation coefficient b/w & variable doesn’t necessarily implies that their exist a
causal-effect relationship.

Proof:

iii) Suppose we have n pairs of observations (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )


𝑐𝑜𝑣(𝑥, 𝑦)
𝑟𝑥𝑦 = ( )
𝑠𝑥 𝑠𝑦
𝑥𝑖 − 𝑎 𝑦𝑖 − 𝑐
𝑢𝑖 = , 𝑣𝑖 =
𝑏 𝑑
1 1 𝑐𝑜𝑣(𝑢, 𝑣)
𝑢𝑖 − 𝑢 = (𝑥𝑖 − 𝑥) ; 𝑣𝑖 − 𝑣 = (𝑦𝑖 − 𝑦) ; 𝑟𝑢𝑣 =
𝑏 𝑑 𝑠𝑢 𝑠𝑣
1
𝑐𝑜𝑣(𝑢 , 𝑣) = 𝑐𝑜𝑣(𝑥, 𝑦)
𝑏𝑑
1 1
𝑉(𝑢) = 2 𝑉(𝑥) ⇒ 𝑠𝑢 = 𝑠
𝑏 |𝑏| 𝑥
1 1
𝑉(𝑣) = 2 𝑉(𝑦) ⇒ 𝑠𝑣 = 𝑠
𝑑 |𝑑| 𝑦
|𝑏||𝑑| 𝑐𝑜𝑣(𝑥,𝑦) 𝑏𝑑
𝑟𝑢𝑣 = .( ) ⇒ 𝑟𝑥𝑦 = 𝑟
|𝑏||𝑑| 𝑢𝑣
𝑏𝑑 𝑠𝑥 .𝑠𝑦
|𝑏||𝑑|
If b , d are of same sign then = +1 in that case 𝑟𝑥𝑦 = 𝑟𝑢𝑣
𝑏𝑑
|𝑏||𝑑|
If b . d are of opposite sign then = −1 in that case 𝑟𝑥𝑦 = −𝑟𝑢𝑣
𝑏𝑑

Proof:

iv) Suppose we have n pairs of observation (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )


𝑟𝑥𝑦 = 𝑐𝑜𝑣 (𝑥, 𝑦)
𝑥𝑖 − 𝑥 𝑦𝑖 −𝑦
Let us define 𝑢𝑖 = , 𝑣𝑖 =
𝑠𝑥 𝑠𝑦
𝑛
∑ 𝑢𝑖2 = 𝑉(𝑥) = 𝑛 ; ∑ 𝑣𝑖2 =𝑛
𝑉(𝑥)
{(𝑥𝑖 −𝑥)(𝑦𝑖 −𝑦)}
∑ 𝑢𝑖 𝑣𝑖 = ∑ = 𝑛𝑟𝑥𝑦 ⇒ ∑(𝑢𝑖 ± 𝑣𝑖 )2 ≥ 0 ∀𝑖 ⇒ 1 ± 𝑟𝑥𝑦 ≥ 0 ⇒ 𝑟𝑥𝑦 ≥ −1 & 1 − 𝑟𝑥𝑦 ≥ 0
𝑠𝑥 𝑠𝑦

𝑟𝑥𝑦 ≤ 1 ⇒ −1 ≤ 𝑟𝑥𝑦 ≤ 1
Note:
Discuss the cases of equality 𝑟𝑥𝑦 = −1 ⇒ 𝑢𝑖 = −𝑣𝑖
𝑠𝑦
𝑦𝑖 = − ( ) (𝑥𝑖 − 𝑥) + 𝑦
𝑠𝑥
That is if there exists an exact linear relationship b/w x and y with negative slope
Again 𝑟𝑥𝑦 = 1 ⇒ 𝑢𝑖 = 𝑣𝑖
𝑠𝑦
𝑦𝑖 = ( ) (𝑥𝑖 − 𝑥) + 𝑦
𝑠𝑥
That is if ther exists an exact linear relationship b/w x and y with positive slope
Note:
If 𝑎𝑥 + 𝑏𝑦 + 𝑐 = 0 then find correlation coefficient b/w (x,y) ?
Ans:
𝑎 𝑏
𝑎(𝑥𝑖 − 𝑥) = −𝑏(𝑦𝑖 − 𝑦) ⇒ 𝑟𝑥𝑦 = (− ) | |
𝑏 𝑎
𝑏 𝑏
If a,b are in same sign then | | = , then , 𝑟𝑥𝑦 = −𝑣𝑒
𝑎 𝑎
𝑏 𝑏
If a , b are in opposite sign then | | = − then 𝑟𝑥𝑦 = +𝑣𝑒
𝑎 𝑎

What do you mean by regression ?


By regression of a variable y on another variable x , we mean dependence of y on x on an average if y
is expressed as a mathematical function of x as 𝑦 = 𝑓(𝑥) then 𝑦 = 𝑓(𝑥) is called regression
equation of y on x

Normal equation and derivations of regression lines:


Here we use the least square methods ; minimizing the error sum of squares
∑ 𝑒𝑖2 = ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 )2 = 𝑓(𝑎, 𝑏) = 𝑓
𝛿𝑓
i) = 0 ⇒ 2 ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 ) (−1) = 0
𝛿𝑎
𝛿𝑓
ii) = 0 ⇒ 2 ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 )(−𝑥𝑖 ) = 0 both are the normal equations ,
𝛿𝑏
𝑐𝑜𝑣(𝑥,𝑦)
By solving i)× ∑ 𝑥𝑖 and ii)× 𝑛 we get 𝑏 = = 𝑏̂ (say)
𝑉(𝑥)
By putting the value of b the estimated value of a is 𝑎̂ = (𝑦 − 𝑏̂𝑥)

Hence the regression equation of y on x is


𝑐𝑜𝑣(𝑥, 𝑦) 𝑐𝑜𝑣(𝑥, 𝑦)
𝑌 = 𝑎̂ + 𝑏̂𝑥 = 𝑦 + 𝑏̂(𝑥 − 𝑥) = 𝑦 + (𝑥 − 𝑥) ⇒ 𝑏𝑦𝑥 =
𝑉(𝑥) 𝑉(𝑥)
𝑏𝑦𝑥 is called the regression coefficient of y on x

Properties of Regression equation and coefficient:

i) Mean of the observed values = Mean of predicted values ie ;


a) e = 0
ii) Variance of error = 𝑉𝑎𝑟(𝑦){1 − 𝑟 2 }
iii) 𝑉𝑎𝑟(𝑌) = 𝑟 2 𝑉𝑎𝑟(𝑦)
iv) regression equation intersect at the point (𝑥, 𝑦)
1−𝑟 2 𝑠𝑥 .𝑠𝑦
v) the acute angle 𝜃 between 2 regression lines is given by tan 𝜃 = |𝑟|
×
𝑉(𝑥)+𝑉(𝑦)
vi) Regression coefficient independent of change of origin but depends upon change of scale.
vii) |𝑟| = G.M of 2 regression coefficient
viii) 𝑟 , 𝑏𝑦𝑥 , 𝑏𝑦𝑥 be always have same sign
ix) 𝐴. 𝑀 of the absolute value of the regression coefficient cant less than absolute value of r.
x) 𝑐𝑜𝑣(𝑥 , 𝑒) = 0
xi) 𝑐𝑜𝑣( 𝑌 , 𝑒) = 0
xii) 𝑐𝑜𝑣( 𝑦 , 𝑒) = 𝑉(𝑒)
xiii) r b/w observed value and predicted value always positive.

Proof:

i) Regression equation of y on x
1 1
𝑌 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥) ⇒ ∑ 𝑌𝑖 = 𝑦 + 𝑏𝑦𝑥 ∑(𝑥𝑖 − 𝑥)
𝑛 𝑛
𝑦= 𝑌
1 1 1
a) 𝑒 = ∑ 𝑒𝑖 ⇒ ∑ 𝑦𝑖 − ∑ 𝑌𝑖 ⇒ 𝑦 − 𝑌 = 0
𝑛 𝑛 𝑛
1 1 2
ii) 𝑉(𝑒) = ∑ 𝑒𝑖2 [ ∴ 𝑒 = 0] ⇒ ∑ (𝑦𝑖 − 𝑦 − 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥))
𝑛 𝑛
𝑉(𝑦) + 𝑏𝑦2𝑥 𝑉(𝑥) − 2𝑏𝑦𝑥 𝑐𝑜𝑣(𝑥, 𝑦) ⇒ 𝑉(𝑦) + 𝑟 2 𝑉(𝑦) − 2𝑟 2 𝑉(𝑦) ⇒ 𝑉(𝑥)(1 − 𝑟 2 )
𝑐𝑜𝑣(𝑥,𝑦) 𝑠𝑦
∴ 𝑏𝑦𝑥 = =𝑟∙ 𝑉(𝑒) ≥ 0 ⇒ 𝑟 2 ≤ 1 ⇒ −1 ≤ 𝑟 ≤ 1
𝑉(𝑥) 𝑠𝑥
1 1 2
iii) 𝑌𝑖 = 𝑦 + 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥) ⇒ ∑(𝑌𝑖 − 𝑦)2 ⇒ 𝑏𝑦2𝑥 ∑(𝑥𝑖 − 𝑥) ⇒ 𝑟 2 𝑉(𝑦)
𝑛 𝑛
iv) 𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥) ⇒ (𝑥𝑖 − 𝑥) = 𝑏𝑥𝑦 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥) ⇒ 𝑥 = 𝑥
1 𝑚1 −𝑚2 𝑏𝑦𝑥 𝑏𝑥𝑦 −1 𝑏𝑥𝑦
v) 𝑚1 = 𝑏𝑦𝑥 , 𝑚2 = ⇒ tan 𝜃 = | |⇒| ×( )| ⇒
𝑏𝑥𝑦 1+𝑚1 𝑚2 𝑏𝑥𝑦 𝑏𝑥𝑦 +𝑏𝑦𝑥
1−𝑟 2 𝑠𝑥 𝑠𝑦
|𝑟|
×
𝑉(𝑥)+𝑉(𝑦)

Special case:

When 𝑟 = 0 , ⇒ tan 𝜃 = ∞ , 𝜃 = 90°


𝑟 = ±1 , ⇒ 𝑡𝑎𝑛𝜃 = 0 ⇒ 𝜃 = 0° , 180°
As they are intersecting line when 𝑟 = 0 then they must coinside.
𝑥−𝑎 𝑦−𝑐
vi) Suppose 𝑢 = ; 𝑣=
𝑏 𝑑
𝑐𝑜𝑣(𝑥, 𝑦) 𝑏𝑑 𝑐𝑜𝑣(𝑢, 𝑣) 𝑏
= 2 𝑏𝑥𝑦 =
= 𝑏𝑢𝑣
𝑉(𝑦) 𝑑 𝑉(𝑣) 𝑑
Its independent upon the change of the origin but depend upon change of scale.
vii) √𝑏𝑦𝑥 𝑏𝑥𝑦 = √𝑟 2 = |𝑟|
𝑟𝑠𝑦 𝑟𝑠𝑥
viii) 𝑏𝑦𝑥 = ; 𝑏𝑥𝑦 = ⇒ its obvious that sign depends on the sign of r
𝑠𝑥 𝑠𝑦
|𝑏𝑦𝑥 +𝑏𝑥𝑦 | |𝑟| 𝑉(𝑥)+𝑉(𝑦)
ix) = ≥ |𝑟|
2 2 𝑠𝑥 𝑠𝑦

x) .
1 1 1
xi) 𝐶𝑜𝑣(𝑥, 𝑒) = ∑ 𝑥𝑖 𝑒𝑖 − 𝑥𝑒 ⇒ ∑ 𝑥𝑖 𝑒𝑖 ⇒ ∑ 𝑥𝑖 (𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 ) = 0 ( 2nd normal
𝑛 𝑛 𝑛
equation)
1 1 1
xii) 𝐶𝑜𝑣(𝑌, 𝑒) = ∑ 𝑌𝑖 𝑒𝑖 = ∑(𝑎 + 𝑏𝑥𝑖 )𝑒𝑖 = 0 + 𝑏 ∑ 𝑥𝑖 𝑒𝑖 = 0
𝑛 𝑛 𝑛
1 1 1
xiii) 𝐶𝑜𝑣(𝑦, 𝑒) = ∑ 𝑦𝑖 𝑒𝑖 = ∑(𝑒𝑖 + 𝑌𝑖 )𝑒𝑖 = ∑ 𝑒𝑖2 = 𝑉(𝑒)
𝑛 𝑛 𝑛
𝐶𝑜𝑣(𝑌,𝑦) 𝑏𝐶𝑜𝑣(𝑥,𝑦) 𝑏
xiv) 𝑟𝑌𝑦 = = |𝑏|𝑠𝑥 𝑠𝑦
= |𝑏| 𝑟𝑥𝑦
𝑠𝑦 𝑠𝑌

Explained and unexplained variation:

𝑦 = 𝑌 + 𝑒 ⇒ 𝑉(𝑦) = 𝑉(𝑌) + 𝑉(𝑒) [ ∴ 𝐶𝑜𝑣(𝑌, 𝑒) = 0]

Total variation = Explained variation + unexplained variation


2 2
(𝑌 −𝑌) ∑ 𝑒𝑖2 (𝑌 −𝑌)
⇒ 1 = ∑ ∑(𝑦𝑖 + ∑(𝑦 ⇒ ∑ ∑(𝑦𝑖 = proportion of total variation explained by regression line.
𝑖 −𝑦)2 𝑖 −𝑦)2 𝑖 −𝑦)
2

This is also known as coefficient of determination and denoted by 𝑅2

2
2
(𝑌𝑖 − 𝑌) 2
(𝑥𝑖 − 𝑥)2 𝑉(𝑦) 𝑉(𝑥)
𝑅 = ∑ 2
= 𝑏 ∑ = 𝑟2 = 𝑟2
∑(𝑦𝑖 − 𝑦) ∑(𝑦𝑖 − 𝑦) 𝑉(𝑥) 𝑉(𝑦)

1 − 𝑅2 = 𝐾 2 → known as the coefficient of non determination.


Rank Correlation:

What do you mean by ranking?

An ordered arrangement of individuals according to their degree of the position of a characteristics under
study is called ranking.

What do you mean by rank?

The ordinal no. Given to an individuals in a ranking is called rank.

What do you mean by rank “r” of an individual’s?

By rank ‘r’ of an individual’s we mean there are ‘r-1’ individuals. Who poses the characteristics under study in a
higher degree than the individuals.

Rank Correlation:

The association b/w 2 series of rank is known as rank correlation. Its measured by spearman rank correlation
coefficient.

Spearman Rank correlation coefficient is nothing but pearsons product moment correlation coefficient b/w
two series of ranks. Spearman rank correlation coefficient is given by

6 ∑ 𝑑𝑖2
𝑟𝑅 𝑜𝑟 𝑟𝑆 = 1 −
𝑛(𝑛2 − 1)

Where 𝑑𝑖 = 𝑢𝑖 − 𝑣𝑖 ; difference b/w two sets of ranks.

Derivation of Spearman Rank Correlation Coefficient formula ( in case of no tie):

Suppose we have n pairs of rank. (𝑢1 , 𝑣1 ), (𝑢2 , 𝑣2 ), … (𝑢𝑛 , 𝑣𝑛 )....for n individuals

𝐶𝑜𝑣(𝑢, 𝑣)
𝑟𝑅 =
𝑠𝑢 𝑠𝑣

1 2 𝑛(𝑛 + 1)(2𝑛 + 1) (𝑛 + 1)2 𝑛2 − 1


𝑉(𝑢) = ∑ 𝑢𝑖2 − 𝑢 = − =
𝑛 6𝑛 4 12
𝑛2 −1
Similarly 𝑉(𝑣) =
12

Suppose 𝑑𝑖 = 𝑢𝑖 − 𝑣𝑖 ⇒ 𝑑 = 𝑢 − 𝑣 ⇒ 𝑑𝑖 − 𝑑 = (𝑢𝑖 − 𝑢) − (𝑣𝑖 − 𝑣) ⇒

𝑑𝑖2 = (𝑢𝑖 − 𝑢)2 + (𝑣𝑖 − 𝑣)2 − 2(𝑢𝑖 − 𝑢)(𝑣𝑖 − 𝑣) [∴ 𝑑 = 𝑢 − 𝑣 = 0]

1
∑ 𝑑𝑖2 = 𝑉(𝑢) + 𝑉(𝑣) − 2𝐶𝑜𝑣(𝑢, 𝑣)
𝑛

1
𝐶𝑜𝑣(𝑢, 𝑣) = 𝑉(𝑢) − ∑ 𝑑𝑖2
2𝑛

𝐶𝑜𝑣(𝑢, 𝑣) ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2


𝑟𝑅 = =1− = 1−
𝑠𝑢 𝑠𝑣 2𝑛 𝑉(𝑢) 𝑛(𝑛2 − 1)
Prove that −𝟏 ≤ 𝒓 ≤ 𝟏

Suppose we have n pair of rank (𝑢1 , 𝑣1 ), (𝑢2 , 𝑣2 ), … (𝑢𝑛 , 𝑣𝑛 )....

6 ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2


Now 𝑑𝑖2 ≥ 0 ⇒ ≥0 ⇒ − ≤ 0 ⇒ 1− ≤ 1 ⇒ 𝑟𝑅 ≤ 1 ................................ (1)
𝑛(𝑛2 −1) 𝑛(𝑛2 −1) 𝑛(𝑛2 −1)

Again 𝑑 = 𝑢 − 𝑣 Let 𝑑 ′ = 𝑢 + 𝑣 𝑉(𝑑) + 𝑉(𝑑 ′ ) = 2(𝑉(𝑢) + 𝑉(𝑣)) ⇒ 𝑉(𝑑) ≤ 4 𝑉(𝑢)

1 4(𝑛2 −1) 6 ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2


∑ 𝑑𝑖2 ≤ ⇒ ≤2⇒1− ≥ −1 ⇒ 𝑟𝑅 ≥ −1 ...................................................... (2)
𝑛 12 𝑛(𝑛2 −1) 𝑛(𝑛2 −1)

From (1) & (2) we get −𝟏 ≤ 𝒓 ≤ 𝟏

Discuss the cases of equality:

i) Equality holds in (1) ie; 𝑟𝑅 = 1 ⇒ 𝑑𝑖2 = 0 ⇒ 𝑢𝑖 = 𝑣𝑖

That is when there exists a perfect agreement b/w two set of ranks

ii) Equality holds in (2) ie; 𝑟𝑅 = −1 ⇒ 𝑉(𝑑 ′ ) = 0 ⇒ 𝑑 ′ = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 ⇒ 𝑢𝑖 + 𝑣𝑖 = 𝑘


𝑛(𝑛 + 1)
𝑛𝑘 = × 2 = 𝑛(𝑛 + 1) ⇒ 𝑢𝑖 = (𝑛 + 1) − 𝑣𝑖
2
Such that if there exists a perfect disagreement b/w two sets of rank.

Residual variance:
Error variance is called residual variance

−×−

SOME PROBLEMS ON BIVARIATE ANALYSIS WITH THERE SOLUTIONS (SYMBOLS HAVE THEIR USUAL MEANINGS):

1) Two variants have the least square regression lines 𝒙 + 𝟒𝒚 + 𝟑 = 𝟎 and 𝟒𝒙 + 𝟗𝒚 + 𝟓 = 𝟎 . Find
their mean values and r and 𝑽(𝒙): 𝑽(𝒚) =?
Ans.
3 1 1
𝑥 + 4𝑦 = −3 ⇒ 𝑦 = − − 𝑥 ⇒ 𝑏𝑦𝑥 = −
4 4 4
5 9 9 9 3
& 𝑥 = − − 𝑦 ⇒ 𝑏𝑥𝑦 = − ∴ 𝑟 2 = 𝑏𝑦𝑥 . 𝑏𝑥𝑦 = ⇒𝑟=−
4 4 4 16 4

𝑏𝑦𝑥 𝑉(𝑥)
= = 9: 1
𝑏𝑥𝑦 𝑉(𝑦)

Regression equation intersect at (𝑥, 𝑦) solving these two equation we get 𝑥 = 1 & 𝑦 = −1

2) Two positively correlated variables 𝒙 & 𝑦 have variances 𝑽(𝒙) & 𝑉(𝒚) respectively determine the
𝒔
value of constant ‘a’ such that 𝒙 + 𝒂𝒚 & 𝒙 + ( 𝒙 ) 𝒚 are uncorrelated.
𝒔𝒚

Ans.
For uncorrelated variables we know that
𝑠 1 𝑠
𝐶𝑜𝑣(𝑥 + 𝑎𝑦, 𝑥 + ( 𝑥 )𝑦) = 0 ⇒ ∑{(𝑥 − 𝑥) + 𝑎(𝑦 − 𝑦)}{(𝑥 − 𝑥) + ( 𝑥 ) (𝑦 − 𝑦)} = 0
𝑠𝑦 𝑛 𝑠𝑦
𝑠𝑥 𝑠𝑥
⇒ 𝑉(𝑥) + ( ) 𝐶𝑜𝑣(𝑥, 𝑦) + 𝑎 𝐶𝑜𝑣(𝑥, 𝑦) + 𝑎 ( ) 𝑉(𝑦) = 0
𝑠𝑦 𝑠𝑦
𝑠𝑥
⇒ 𝑎 (𝑠𝑥 𝑠𝑦 + 𝐶𝑜𝑣(𝑥, 𝑦)) + (𝑠𝑥 𝑠𝑦 + 𝐶𝑜𝑣(𝑥, 𝑦)) = 0
𝑠𝑦
𝑠𝑥
⇒ 𝑎 = − ( ) [ ∴ (𝑠𝑥 𝑠𝑦 + 𝐶𝑜𝑣(𝑥, 𝑦)) ≠ 0]
𝑠𝑦

3) Derive 𝒃𝒚𝒙 from following pairs (−𝟐, 𝟏𝟏), (−𝟏, 𝟕), (𝟎, 𝟓), (𝟏, 𝟗), (𝟐, 𝟏𝟐)
Ans.
𝒙𝒊 𝒚𝒊 𝒙𝒊 𝒚 𝒊 𝒙𝟐𝒊 𝒚𝟐𝒊
−2 11 −22 4 121
−1 7 −7 1 49
0 5 0 0 25
1 9 9 1 81
2 12 24 4 144
0 4 10 420

4
𝐶𝑜𝑣(𝑥, 𝑦)
𝑏𝑦𝑥 = = 5 = 0.4
𝑉(𝑥) 10
5
⇒ 𝑦 = 8.8 + 0.4𝑥
4) Give an example where 𝒓𝒙𝒚 = 𝟎 but their exists an exact relationship b/w x and y.

Ans. Let us consider the 5 pairs of observations as follow (−2,4), (−1,1), (0,0), (1,1), (2,4)

We have to find 𝑟𝑥𝑦 ; consider the following table

𝒙𝒊 𝒚𝒊 𝒙𝟐𝒊 𝒚𝟐𝒊 𝒙𝒊 𝒚 𝒊
−2 4 4 16 −8
−1 1 1 1 −1
0 0 0 0 0
1 1 1 1 1
2 4 4 16 8

1
𝐶𝑜𝑣(𝑥, 𝑦) =
∑ 𝑥𝑖 𝑦𝑖 − 𝑥𝑦 ⇒ 0 ⇒ 𝑟𝑥𝑦 = 0
𝑛
But by observations we see that x & y in a relation with 𝑦 = 𝑥 2
5) If 𝒚 = 𝟑𝒙 + 𝟓 then find 𝒓𝒙𝒚 =?

Ans. 𝑦𝑖 = 3𝑥𝑖 + 5 ⇒ 𝑦 = 3𝑥 + 5 ⇒ (𝑦𝑖 − 𝑦) = 3(𝑥𝑖 − 𝑥) Now we have to find 𝑟𝑥𝑦


3
𝐶𝑜𝑣(𝑥,𝑦) ∑(𝑥𝑖 −𝑥)(𝑥𝑖 −𝑥)
𝑟𝑥𝑦 = =𝑛 =1
𝑠𝑥 𝑠𝑦 𝑠𝑥 3𝑠𝑥
1 9
[∴ ∑(𝑦𝑖 − 𝑦)2 = ∑(𝑥𝑖 − 𝑥)2 ]
𝑛 𝑛
6) Why do we use two regression lines?

Ans. When there is a reasonable amount of scatter, we can draw two different regression lines
depending upon which variable we considered to be the most accurate;
1) The first line is the regression of y (dependent variable) on x (independent variable) ; which
can be used to estimate the dependent variable y , given to the condition of x
2) The other line is the regression of x on y ; which can be used to estimate x , given to the
condition y

Note: if there exists a Perfects correlation (both positive and negative) then two regression lines
will be same.

You might also like