Bivariate Analysis
Bivariate Analysis
Δ
BIVARIATE - ANALYSIS
Bivariate data: Data on the 2 variable recorded simultaneously from the group of individuals are called
bivariate data.
If two variables are so related that a change in the magnitude of one variable is accompanied by the change
of other variable then the two variables said to be co-related or associated.
Positive Correlation:
Negetive Correlation:
𝑘 𝑙 𝑘 𝑙
Value of x frequency
𝑥1 − 𝑥1′ 𝑓1𝑗
𝑥2 − 𝑥2′ 𝑓2𝑗
................. 𝑓𝑖𝑗
𝑥𝑘 − 𝑥𝑘′ 𝑓𝑘𝑗
Total 𝑓0𝑗
Marginal distribution of y :
Value of y frequency
𝑦1 − 𝑦1′ 𝑓01
𝑦2 − 𝑦2′ 𝑓02
............... 𝑓0𝑗
𝑦𝑙 − 𝑦𝑙′ 𝑓0𝑙
Total 𝑛
Scatter Diagram: Scatter diagram is one of the diagrammatic representations of bivariate data. It’s a
diagram of points obtained by plotting each pair of observation as a point {(𝑥𝑖 , 𝑦𝑗 ) ∀𝑖 } on a graph taken 2
mutually perpendicular Co-ordinate.
Scatterplot of y vs x Scatterplot of y vs x
100 100
80 80
60 60
y
y
40 40
20 20
0 0
0 10 20 30 40 50 0 10 20 30 40 50
x x
i) ii)
Scatterplot of y vs x Scatterplot of y vs x
140000
90
120000 80
70
100000
60
80000
50
y
60000 40
30
40000
20
20000
10
0 0
0 10 20 30 40 50 0 10 20 30 40 50
x x
iii) iv)
1 𝑛 ′ ′ 1
∑𝑖=1 𝑥𝑖 𝑦𝑖 ( ∑𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦))
𝑟= 𝑛 = 𝑛 , 𝑤ℎ𝑒𝑟𝑒 𝑥𝑖′ = (𝑥𝑖 − 𝑥) , 𝑦𝑖′ = (𝑦𝑖 − 𝑦)
𝑠𝑥 𝑠𝑦 𝑠𝑥 𝑠𝑦
𝑟 > 0 ; (𝑆𝑡𝑟𝑜𝑛𝑔𝑒𝑟)
{ 𝑟 < 0 ; (𝑊𝑒𝑎𝑘𝑒𝑟)
𝑟 = 0 ; (𝑛𝑜 𝑙𝑖𝑛𝑒𝑎𝑟 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛)
1
( ∑𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)) 𝑐𝑜𝑣(𝑥, 𝑦) 1 𝑛
𝑟= 𝑛 = ; 𝑐𝑜𝑣(𝑥, 𝑦) = ( ∑ (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦))
𝑠𝑥 𝑠𝑦 𝑠𝑥 𝑠𝑦 𝑛 𝑖=1
1 𝑛 1 𝑛
∑ 𝑥𝑖 𝑦𝑖 − 𝑥y − 𝑥y + 𝑥y ⇒ ∑ 𝑥𝑖 𝑦𝑖 − 𝑥y ⇒ 𝑐𝑜𝑣(𝑥, 𝑦)
𝑛 𝑖=1 𝑛 𝑖=1
𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖
=𝑟
𝑠𝑥 𝑠𝑦
i) 𝐶𝑜𝑣(𝑎, 𝑏) = 0
ii) 𝐶𝑜𝑣(𝑎𝑥 + 𝑏 , 𝑐) = 0
iii) 𝐶𝑜𝑣(𝑎𝑥 + 𝑏 , 𝑐𝑦 + 𝑑) = 𝑎𝑐 𝐶𝑜𝑣(𝑥, 𝑦)
iv) 𝐶𝑜𝑣(𝑎𝑥 + 𝑏𝑦 + 𝑘 , 𝑐𝑥 + 𝑑𝑦 + 𝑙) = 𝑎𝑐 𝑉𝑎𝑟(𝑥) + 𝑏𝑑 𝑉𝑎𝑟(𝑦) + (𝑎𝑑 + 𝑏𝑐)𝐶𝑜𝑣(𝑥, 𝑦)
Proof:
i) 𝑥𝑖 = 𝑎 , 𝑦𝑖 = 𝑏 , ∀𝑖 ; ∴ 𝑥 = 𝑎 , 𝑦 = 𝑏
1 1
𝐶𝑜𝑣(𝑎, 𝑏) = ∑(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) ⇒ × 0 = 0
𝑛 𝑛
ii) Do it yourself
iii) 𝑝𝑖 = 𝑎𝑥𝑖 + 𝑏 , 𝑞𝑖 = 𝑐𝑦𝑖 + 𝑑 , ∀𝑖
1
𝐶𝑜𝑣(𝑝𝑖 , 𝑞𝑖 ) = ∑ 𝑎(𝑥𝑖 − 𝑥)𝑐(𝑦𝑖 − 𝑦) = 𝑎𝑐 𝐶𝑜𝑣(𝑥𝑖 , 𝑦𝑖 )
𝑛
iv) Do it your self
Proof :
𝑎𝑥 + 𝑏𝑦 + 𝑐 = 𝑢 ⇒ 𝑎𝑥 + 𝑏𝑦 + 𝑐 = 𝑢
1
𝑉(𝑎𝑥 + 𝑏𝑦 + 𝑐) = 𝑉(𝑢) = ∑{𝑎2 (𝑥𝑖 − 𝑥)2 + 𝑏 2 (𝑦𝑖 − 𝑦)2 + 2𝑎𝑏(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)}
𝑛
𝑎2 𝑉(𝑥) + 𝑏 2 𝑉(𝑦) + 2𝑎𝑏 𝑐𝑜𝑣(𝑥, 𝑦)
Proof:
Proof:
𝑟𝑥𝑦 ≤ 1 ⇒ −1 ≤ 𝑟𝑥𝑦 ≤ 1
Note:
Discuss the cases of equality 𝑟𝑥𝑦 = −1 ⇒ 𝑢𝑖 = −𝑣𝑖
𝑠𝑦
𝑦𝑖 = − ( ) (𝑥𝑖 − 𝑥) + 𝑦
𝑠𝑥
That is if there exists an exact linear relationship b/w x and y with negative slope
Again 𝑟𝑥𝑦 = 1 ⇒ 𝑢𝑖 = 𝑣𝑖
𝑠𝑦
𝑦𝑖 = ( ) (𝑥𝑖 − 𝑥) + 𝑦
𝑠𝑥
That is if ther exists an exact linear relationship b/w x and y with positive slope
Note:
If 𝑎𝑥 + 𝑏𝑦 + 𝑐 = 0 then find correlation coefficient b/w (x,y) ?
Ans:
𝑎 𝑏
𝑎(𝑥𝑖 − 𝑥) = −𝑏(𝑦𝑖 − 𝑦) ⇒ 𝑟𝑥𝑦 = (− ) | |
𝑏 𝑎
𝑏 𝑏
If a,b are in same sign then | | = , then , 𝑟𝑥𝑦 = −𝑣𝑒
𝑎 𝑎
𝑏 𝑏
If a , b are in opposite sign then | | = − then 𝑟𝑥𝑦 = +𝑣𝑒
𝑎 𝑎
Proof:
i) Regression equation of y on x
1 1
𝑌 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥) ⇒ ∑ 𝑌𝑖 = 𝑦 + 𝑏𝑦𝑥 ∑(𝑥𝑖 − 𝑥)
𝑛 𝑛
𝑦= 𝑌
1 1 1
a) 𝑒 = ∑ 𝑒𝑖 ⇒ ∑ 𝑦𝑖 − ∑ 𝑌𝑖 ⇒ 𝑦 − 𝑌 = 0
𝑛 𝑛 𝑛
1 1 2
ii) 𝑉(𝑒) = ∑ 𝑒𝑖2 [ ∴ 𝑒 = 0] ⇒ ∑ (𝑦𝑖 − 𝑦 − 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥))
𝑛 𝑛
𝑉(𝑦) + 𝑏𝑦2𝑥 𝑉(𝑥) − 2𝑏𝑦𝑥 𝑐𝑜𝑣(𝑥, 𝑦) ⇒ 𝑉(𝑦) + 𝑟 2 𝑉(𝑦) − 2𝑟 2 𝑉(𝑦) ⇒ 𝑉(𝑥)(1 − 𝑟 2 )
𝑐𝑜𝑣(𝑥,𝑦) 𝑠𝑦
∴ 𝑏𝑦𝑥 = =𝑟∙ 𝑉(𝑒) ≥ 0 ⇒ 𝑟 2 ≤ 1 ⇒ −1 ≤ 𝑟 ≤ 1
𝑉(𝑥) 𝑠𝑥
1 1 2
iii) 𝑌𝑖 = 𝑦 + 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥) ⇒ ∑(𝑌𝑖 − 𝑦)2 ⇒ 𝑏𝑦2𝑥 ∑(𝑥𝑖 − 𝑥) ⇒ 𝑟 2 𝑉(𝑦)
𝑛 𝑛
iv) 𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥) ⇒ (𝑥𝑖 − 𝑥) = 𝑏𝑥𝑦 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥) ⇒ 𝑥 = 𝑥
1 𝑚1 −𝑚2 𝑏𝑦𝑥 𝑏𝑥𝑦 −1 𝑏𝑥𝑦
v) 𝑚1 = 𝑏𝑦𝑥 , 𝑚2 = ⇒ tan 𝜃 = | |⇒| ×( )| ⇒
𝑏𝑥𝑦 1+𝑚1 𝑚2 𝑏𝑥𝑦 𝑏𝑥𝑦 +𝑏𝑦𝑥
1−𝑟 2 𝑠𝑥 𝑠𝑦
|𝑟|
×
𝑉(𝑥)+𝑉(𝑦)
Special case:
x) .
1 1 1
xi) 𝐶𝑜𝑣(𝑥, 𝑒) = ∑ 𝑥𝑖 𝑒𝑖 − 𝑥𝑒 ⇒ ∑ 𝑥𝑖 𝑒𝑖 ⇒ ∑ 𝑥𝑖 (𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 ) = 0 ( 2nd normal
𝑛 𝑛 𝑛
equation)
1 1 1
xii) 𝐶𝑜𝑣(𝑌, 𝑒) = ∑ 𝑌𝑖 𝑒𝑖 = ∑(𝑎 + 𝑏𝑥𝑖 )𝑒𝑖 = 0 + 𝑏 ∑ 𝑥𝑖 𝑒𝑖 = 0
𝑛 𝑛 𝑛
1 1 1
xiii) 𝐶𝑜𝑣(𝑦, 𝑒) = ∑ 𝑦𝑖 𝑒𝑖 = ∑(𝑒𝑖 + 𝑌𝑖 )𝑒𝑖 = ∑ 𝑒𝑖2 = 𝑉(𝑒)
𝑛 𝑛 𝑛
𝐶𝑜𝑣(𝑌,𝑦) 𝑏𝐶𝑜𝑣(𝑥,𝑦) 𝑏
xiv) 𝑟𝑌𝑦 = = |𝑏|𝑠𝑥 𝑠𝑦
= |𝑏| 𝑟𝑥𝑦
𝑠𝑦 𝑠𝑌
2
2
(𝑌𝑖 − 𝑌) 2
(𝑥𝑖 − 𝑥)2 𝑉(𝑦) 𝑉(𝑥)
𝑅 = ∑ 2
= 𝑏 ∑ = 𝑟2 = 𝑟2
∑(𝑦𝑖 − 𝑦) ∑(𝑦𝑖 − 𝑦) 𝑉(𝑥) 𝑉(𝑦)
An ordered arrangement of individuals according to their degree of the position of a characteristics under
study is called ranking.
By rank ‘r’ of an individual’s we mean there are ‘r-1’ individuals. Who poses the characteristics under study in a
higher degree than the individuals.
Rank Correlation:
The association b/w 2 series of rank is known as rank correlation. Its measured by spearman rank correlation
coefficient.
Spearman Rank correlation coefficient is nothing but pearsons product moment correlation coefficient b/w
two series of ranks. Spearman rank correlation coefficient is given by
6 ∑ 𝑑𝑖2
𝑟𝑅 𝑜𝑟 𝑟𝑆 = 1 −
𝑛(𝑛2 − 1)
𝐶𝑜𝑣(𝑢, 𝑣)
𝑟𝑅 =
𝑠𝑢 𝑠𝑣
1
∑ 𝑑𝑖2 = 𝑉(𝑢) + 𝑉(𝑣) − 2𝐶𝑜𝑣(𝑢, 𝑣)
𝑛
1
𝐶𝑜𝑣(𝑢, 𝑣) = 𝑉(𝑢) − ∑ 𝑑𝑖2
2𝑛
That is when there exists a perfect agreement b/w two set of ranks
Residual variance:
Error variance is called residual variance
−×−
SOME PROBLEMS ON BIVARIATE ANALYSIS WITH THERE SOLUTIONS (SYMBOLS HAVE THEIR USUAL MEANINGS):
1) Two variants have the least square regression lines 𝒙 + 𝟒𝒚 + 𝟑 = 𝟎 and 𝟒𝒙 + 𝟗𝒚 + 𝟓 = 𝟎 . Find
their mean values and r and 𝑽(𝒙): 𝑽(𝒚) =?
Ans.
3 1 1
𝑥 + 4𝑦 = −3 ⇒ 𝑦 = − − 𝑥 ⇒ 𝑏𝑦𝑥 = −
4 4 4
5 9 9 9 3
& 𝑥 = − − 𝑦 ⇒ 𝑏𝑥𝑦 = − ∴ 𝑟 2 = 𝑏𝑦𝑥 . 𝑏𝑥𝑦 = ⇒𝑟=−
4 4 4 16 4
𝑏𝑦𝑥 𝑉(𝑥)
= = 9: 1
𝑏𝑥𝑦 𝑉(𝑦)
Regression equation intersect at (𝑥, 𝑦) solving these two equation we get 𝑥 = 1 & 𝑦 = −1
2) Two positively correlated variables 𝒙 & 𝑦 have variances 𝑽(𝒙) & 𝑉(𝒚) respectively determine the
𝒔
value of constant ‘a’ such that 𝒙 + 𝒂𝒚 & 𝒙 + ( 𝒙 ) 𝒚 are uncorrelated.
𝒔𝒚
Ans.
For uncorrelated variables we know that
𝑠 1 𝑠
𝐶𝑜𝑣(𝑥 + 𝑎𝑦, 𝑥 + ( 𝑥 )𝑦) = 0 ⇒ ∑{(𝑥 − 𝑥) + 𝑎(𝑦 − 𝑦)}{(𝑥 − 𝑥) + ( 𝑥 ) (𝑦 − 𝑦)} = 0
𝑠𝑦 𝑛 𝑠𝑦
𝑠𝑥 𝑠𝑥
⇒ 𝑉(𝑥) + ( ) 𝐶𝑜𝑣(𝑥, 𝑦) + 𝑎 𝐶𝑜𝑣(𝑥, 𝑦) + 𝑎 ( ) 𝑉(𝑦) = 0
𝑠𝑦 𝑠𝑦
𝑠𝑥
⇒ 𝑎 (𝑠𝑥 𝑠𝑦 + 𝐶𝑜𝑣(𝑥, 𝑦)) + (𝑠𝑥 𝑠𝑦 + 𝐶𝑜𝑣(𝑥, 𝑦)) = 0
𝑠𝑦
𝑠𝑥
⇒ 𝑎 = − ( ) [ ∴ (𝑠𝑥 𝑠𝑦 + 𝐶𝑜𝑣(𝑥, 𝑦)) ≠ 0]
𝑠𝑦
3) Derive 𝒃𝒚𝒙 from following pairs (−𝟐, 𝟏𝟏), (−𝟏, 𝟕), (𝟎, 𝟓), (𝟏, 𝟗), (𝟐, 𝟏𝟐)
Ans.
𝒙𝒊 𝒚𝒊 𝒙𝒊 𝒚 𝒊 𝒙𝟐𝒊 𝒚𝟐𝒊
−2 11 −22 4 121
−1 7 −7 1 49
0 5 0 0 25
1 9 9 1 81
2 12 24 4 144
0 4 10 420
4
𝐶𝑜𝑣(𝑥, 𝑦)
𝑏𝑦𝑥 = = 5 = 0.4
𝑉(𝑥) 10
5
⇒ 𝑦 = 8.8 + 0.4𝑥
4) Give an example where 𝒓𝒙𝒚 = 𝟎 but their exists an exact relationship b/w x and y.
Ans. Let us consider the 5 pairs of observations as follow (−2,4), (−1,1), (0,0), (1,1), (2,4)
𝒙𝒊 𝒚𝒊 𝒙𝟐𝒊 𝒚𝟐𝒊 𝒙𝒊 𝒚 𝒊
−2 4 4 16 −8
−1 1 1 1 −1
0 0 0 0 0
1 1 1 1 1
2 4 4 16 8
1
𝐶𝑜𝑣(𝑥, 𝑦) =
∑ 𝑥𝑖 𝑦𝑖 − 𝑥𝑦 ⇒ 0 ⇒ 𝑟𝑥𝑦 = 0
𝑛
But by observations we see that x & y in a relation with 𝑦 = 𝑥 2
5) If 𝒚 = 𝟑𝒙 + 𝟓 then find 𝒓𝒙𝒚 =?
Ans. When there is a reasonable amount of scatter, we can draw two different regression lines
depending upon which variable we considered to be the most accurate;
1) The first line is the regression of y (dependent variable) on x (independent variable) ; which
can be used to estimate the dependent variable y , given to the condition of x
2) The other line is the regression of x on y ; which can be used to estimate x , given to the
condition y
Note: if there exists a Perfects correlation (both positive and negative) then two regression lines
will be same.