0% found this document useful (0 votes)

72 views

Bivariate Analysis

Bivariate data involves two variables that are recorded simultaneously from individuals in a group. Examples include heights and weights of students, age and blood pressure of individuals, and income and expenditure of families. Correlation refers to the association or independence between two variables. Positive correlation means a variable increases on average as the other increases, while negative correlation means a variable decreases as the other increases. Scatter diagrams graphically display the relationship between two variables with points plotted on a coordinate plane. Pearson's product-moment correlation coefficient, denoted by r, measures the strength and direction of linear relationships between variables on a scale from -1 to 1.

Uploaded by

Madara Uchiha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views

Bivariate Analysis

Uploaded by

Madara Uchiha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

2

Δ
BIVARIATE - ANALYSIS

Bivariate data: Data on the 2 variable recorded simultaneously from the group of individuals are called
bivariate data.

Example: i) Heights (x) and Weights (y) of the students.

ii) Age (x) and Blood pressure (y) of a group of individuals.

iii) Marks in test (x) Marks in final exam (y) of students.

iv) income and expenditure of a number of family etc.

What do you mean by Correalation:

By co-relation we mean association or independence b/w two variables.

If two variables are so related that a change in the magnitude of one variable is accompanied by the change
of other variable then the two variables said to be co-related or associated.

Positive Correlation:

If a variable is found to increase (decrease) on an average with the increase (decrease) of

other variables them the variable said to be Positive Correlated.

Negetive Correlation:

If a variable is found to decrease (increase) on an average with the increase (decrease) of

other variable then the variable said to be Negative Correlated.

𝑦1 − 𝑦1′ 𝑦2 − 𝑦2′ .......... 𝑦𝑙 − 𝑦𝑙′ Total

𝑥1 − 𝑥1′ 𝑓11 𝑓12 𝑓1𝑗 𝑓1𝑙 𝑓10
𝑥2 − 𝑥2′ 𝑓21 𝑓22 𝑓2𝑗 𝑓2𝑙 𝑓20
............ ...................... ....................... 𝑓𝑖𝑗 ....................... 𝑓𝑖0
𝑥𝑘 − 𝑥𝑘′ 𝑓𝑘1 𝑓𝑘2 𝑓𝑘𝑗 𝑓𝑘𝑙 𝑓𝑘0
Total 𝑓01 𝑓02 𝑓0𝑗 𝑓0𝑙 𝑛

𝑘 𝑙 𝑘 𝑙

𝑛 = ∑ ∑ 𝑓𝑖𝑗 = ∑ 𝑓𝑖0 = ∑ 𝑓0𝑗 = 𝑛

𝑖=1 𝑗=1 𝑖=1 𝑗=1
Conditional distribution of x given y belongs to the 𝒋𝒕𝒉 class:

Value of x frequency
𝑥1 − 𝑥1′ 𝑓1𝑗
𝑥2 − 𝑥2′ 𝑓2𝑗
................. 𝑓𝑖𝑗
𝑥𝑘 − 𝑥𝑘′ 𝑓𝑘𝑗
Total 𝑓0𝑗

Marginal distribution of y :

Value of y frequency
𝑦1 − 𝑦1′ 𝑓01
𝑦2 − 𝑦2′ 𝑓02
............... 𝑓0𝑗
𝑦𝑙 − 𝑦𝑙′ 𝑓0𝑙
Total 𝑛

Scatter Diagram: Scatter diagram is one of the diagrammatic representations of bivariate data. It’s a
diagram of points obtained by plotting each pair of observation as a point {(𝑥𝑖 , 𝑦𝑗 ) ∀𝑖 } on a graph taken 2
mutually perpendicular Co-ordinate.

Scatterplot of y vs x Scatterplot of y vs x

100 100

80 80

60 60
y
y

40 40

20 20

0 0
0 10 20 30 40 50 0 10 20 30 40 50
x x
i) ii)

Scatterplot of y vs x Scatterplot of y vs x
140000
90

120000 80

70
100000
60
80000
50
y

60000 40

30
40000
20
20000
10

0 0

0 10 20 30 40 50 0 10 20 30 40 50
x x
iii) iv)

Pearson’s product moment correlation coefficient:

1 𝑛 ′ ′ 1
∑𝑖=1 𝑥𝑖 𝑦𝑖 ( ∑𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦))
𝑟= 𝑛 = 𝑛 , 𝑤ℎ𝑒𝑟𝑒 𝑥𝑖′ = (𝑥𝑖 − 𝑥) , 𝑦𝑖′ = (𝑦𝑖 − 𝑦)
𝑠𝑥 𝑠𝑦 𝑠𝑥 𝑠𝑦
𝑟 > 0 ; (𝑆𝑡𝑟𝑜𝑛𝑔𝑒𝑟)
{ 𝑟 < 0 ; (𝑊𝑒𝑎𝑘𝑒𝑟)
𝑟 = 0 ; (𝑛𝑜 𝑙𝑖𝑛𝑒𝑎𝑟 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛)

Working Formula for 𝒓:

1
( ∑𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)) 𝑐𝑜𝑣(𝑥, 𝑦) 1 𝑛
𝑟= 𝑛 = ; 𝑐𝑜𝑣(𝑥, 𝑦) = ( ∑ (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦))
𝑠𝑥 𝑠𝑦 𝑠𝑥 𝑠𝑦 𝑛 𝑖=1

1 𝑛 1 𝑛
∑ 𝑥𝑖 𝑦𝑖 − 𝑥y − 𝑥y + 𝑥y ⇒ ∑ 𝑥𝑖 𝑦𝑖 − 𝑥y ⇒ 𝑐𝑜𝑣(𝑥, 𝑦)
𝑛 𝑖=1 𝑛 𝑖=1

𝑛 ∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 ∑ 𝑦𝑖
=𝑟
𝑠𝑥 𝑠𝑦

Some results on covariants:

i) 𝐶𝑜𝑣(𝑎, 𝑏) = 0
ii) 𝐶𝑜𝑣(𝑎𝑥 + 𝑏 , 𝑐) = 0
iii) 𝐶𝑜𝑣(𝑎𝑥 + 𝑏 , 𝑐𝑦 + 𝑑) = 𝑎𝑐 𝐶𝑜𝑣(𝑥, 𝑦)
iv) 𝐶𝑜𝑣(𝑎𝑥 + 𝑏𝑦 + 𝑘 , 𝑐𝑥 + 𝑑𝑦 + 𝑙) = 𝑎𝑐 𝑉𝑎𝑟(𝑥) + 𝑏𝑑 𝑉𝑎𝑟(𝑦) + (𝑎𝑑 + 𝑏𝑐)𝐶𝑜𝑣(𝑥, 𝑦)

Proof:

i) 𝑥𝑖 = 𝑎 , 𝑦𝑖 = 𝑏 , ∀𝑖 ; ∴ 𝑥 = 𝑎 , 𝑦 = 𝑏
1 1
𝐶𝑜𝑣(𝑎, 𝑏) = ∑(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) ⇒ × 0 = 0
𝑛 𝑛

ii) Do it yourself
iii) 𝑝𝑖 = 𝑎𝑥𝑖 + 𝑏 , 𝑞𝑖 = 𝑐𝑦𝑖 + 𝑑 , ∀𝑖
1
𝐶𝑜𝑣(𝑝𝑖 , 𝑞𝑖 ) = ∑ 𝑎(𝑥𝑖 − 𝑥)𝑐(𝑦𝑖 − 𝑦) = 𝑎𝑐 𝐶𝑜𝑣(𝑥𝑖 , 𝑦𝑖 )
𝑛
iv) Do it your self

Some results on variance:

i) 𝑉(𝑎𝑥 + 𝑏𝑦 + 𝑐) = 𝑎2 𝑉(𝑥) + 𝑏 2 𝑉(𝑦) + 2𝑎𝑏 𝑐𝑜𝑣(𝑥, 𝑦)

ii) 𝑉(∑𝑘𝑖=1 𝑥𝑖 ) = ∑𝑘𝑖=1 𝑉(𝑥𝑖 ) + 2 ∑𝑘𝑖<𝑗 ∑ 𝑐𝑜𝑣 (𝑥𝑖 , 𝑥𝑗 )

Proof :

𝑎𝑥 + 𝑏𝑦 + 𝑐 = 𝑢 ⇒ 𝑎𝑥 + 𝑏𝑦 + 𝑐 = 𝑢

∴ 𝑎(𝑥𝑖 − 𝑥) + 𝑏(𝑦𝑖 − 𝑦) = (𝑢𝑖 − 𝑢)

1
𝑉(𝑎𝑥 + 𝑏𝑦 + 𝑐) = 𝑉(𝑢) = ∑{𝑎2 (𝑥𝑖 − 𝑥)2 + 𝑏 2 (𝑦𝑖 − 𝑦)2 + 2𝑎𝑏(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)}
𝑛
𝑎2 𝑉(𝑥) + 𝑏 2 𝑉(𝑦) + 2𝑎𝑏 𝑐𝑜𝑣(𝑥, 𝑦)

Properties of correlation coefficient:

i) Its a pure number or unit free number.

ii) 𝑟𝑥𝑦 = 𝑟𝑦𝑥
iii) Numerical value of the correlation coefficient independent of change of the origin and scale.
(correlation coefficient is independent of change of origin but depend upon change of scale)
iv) −1 ≤ 𝑟 ≤ 1
v) Correlation coefficient is not a resistance measure. Its highly effected by outliers.
vi) Correlation coefficient = 0 ⇏ doesn’t necessarily imply variables are independent there may
exist some non linear relationship.
vii) Existence of correlation coefficient b/w & variable doesn’t necessarily implies that their exist a
causal-effect relationship.

Proof:

iii) Suppose we have n pairs of observations (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )

𝑐𝑜𝑣(𝑥, 𝑦)
𝑟𝑥𝑦 = ( )
𝑠𝑥 𝑠𝑦
𝑥𝑖 − 𝑎 𝑦𝑖 − 𝑐
𝑢𝑖 = , 𝑣𝑖 =
𝑏 𝑑
1 1 𝑐𝑜𝑣(𝑢, 𝑣)
𝑢𝑖 − 𝑢 = (𝑥𝑖 − 𝑥) ; 𝑣𝑖 − 𝑣 = (𝑦𝑖 − 𝑦) ; 𝑟𝑢𝑣 =
𝑏 𝑑 𝑠𝑢 𝑠𝑣
1
𝑐𝑜𝑣(𝑢 , 𝑣) = 𝑐𝑜𝑣(𝑥, 𝑦)
𝑏𝑑
1 1
𝑉(𝑢) = 2 𝑉(𝑥) ⇒ 𝑠𝑢 = 𝑠
𝑏 |𝑏| 𝑥
1 1
𝑉(𝑣) = 2 𝑉(𝑦) ⇒ 𝑠𝑣 = 𝑠
𝑑 |𝑑| 𝑦
|𝑏||𝑑| 𝑐𝑜𝑣(𝑥,𝑦) 𝑏𝑑
𝑟𝑢𝑣 = .( ) ⇒ 𝑟𝑥𝑦 = 𝑟
|𝑏||𝑑| 𝑢𝑣
𝑏𝑑 𝑠𝑥 .𝑠𝑦
|𝑏||𝑑|
If b , d are of same sign then = +1 in that case 𝑟𝑥𝑦 = 𝑟𝑢𝑣
𝑏𝑑
|𝑏||𝑑|
If b . d are of opposite sign then = −1 in that case 𝑟𝑥𝑦 = −𝑟𝑢𝑣
𝑏𝑑

Proof:

iv) Suppose we have n pairs of observation (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )

𝑟𝑥𝑦 = 𝑐𝑜𝑣 (𝑥, 𝑦)
𝑥𝑖 − 𝑥 𝑦𝑖 −𝑦
Let us define 𝑢𝑖 = , 𝑣𝑖 =
𝑠𝑥 𝑠𝑦
𝑛
∑ 𝑢𝑖2 = 𝑉(𝑥) = 𝑛 ; ∑ 𝑣𝑖2 =𝑛
𝑉(𝑥)
{(𝑥𝑖 −𝑥)(𝑦𝑖 −𝑦)}
∑ 𝑢𝑖 𝑣𝑖 = ∑ = 𝑛𝑟𝑥𝑦 ⇒ ∑(𝑢𝑖 ± 𝑣𝑖 )2 ≥ 0 ∀𝑖 ⇒ 1 ± 𝑟𝑥𝑦 ≥ 0 ⇒ 𝑟𝑥𝑦 ≥ −1 & 1 − 𝑟𝑥𝑦 ≥ 0
𝑠𝑥 𝑠𝑦

𝑟𝑥𝑦 ≤ 1 ⇒ −1 ≤ 𝑟𝑥𝑦 ≤ 1
Note:
Discuss the cases of equality 𝑟𝑥𝑦 = −1 ⇒ 𝑢𝑖 = −𝑣𝑖
𝑠𝑦
𝑦𝑖 = − ( ) (𝑥𝑖 − 𝑥) + 𝑦
𝑠𝑥
That is if there exists an exact linear relationship b/w x and y with negative slope
Again 𝑟𝑥𝑦 = 1 ⇒ 𝑢𝑖 = 𝑣𝑖
𝑠𝑦
𝑦𝑖 = ( ) (𝑥𝑖 − 𝑥) + 𝑦
𝑠𝑥
That is if ther exists an exact linear relationship b/w x and y with positive slope
Note:
If 𝑎𝑥 + 𝑏𝑦 + 𝑐 = 0 then find correlation coefficient b/w (x,y) ?
Ans:
𝑎 𝑏
𝑎(𝑥𝑖 − 𝑥) = −𝑏(𝑦𝑖 − 𝑦) ⇒ 𝑟𝑥𝑦 = (− ) | |
𝑏 𝑎
𝑏 𝑏
If a,b are in same sign then | | = , then , 𝑟𝑥𝑦 = −𝑣𝑒
𝑎 𝑎
𝑏 𝑏
If a , b are in opposite sign then | | = − then 𝑟𝑥𝑦 = +𝑣𝑒
𝑎 𝑎

What do you mean by regression ?

By regression of a variable y on another variable x , we mean dependence of y on x on an average if y
is expressed as a mathematical function of x as 𝑦 = 𝑓(𝑥) then 𝑦 = 𝑓(𝑥) is called regression
equation of y on x

Normal equation and derivations of regression lines:

Here we use the least square methods ; minimizing the error sum of squares
∑ 𝑒𝑖2 = ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 )2 = 𝑓(𝑎, 𝑏) = 𝑓
𝛿𝑓
i) = 0 ⇒ 2 ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 ) (−1) = 0
𝛿𝑎
𝛿𝑓
ii) = 0 ⇒ 2 ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 )(−𝑥𝑖 ) = 0 both are the normal equations ,
𝛿𝑏
𝑐𝑜𝑣(𝑥,𝑦)
By solving i)× ∑ 𝑥𝑖 and ii)× 𝑛 we get 𝑏 = = 𝑏̂ (say)
𝑉(𝑥)
By putting the value of b the estimated value of a is 𝑎̂ = (𝑦 − 𝑏̂𝑥)

Hence the regression equation of y on x is

𝑐𝑜𝑣(𝑥, 𝑦) 𝑐𝑜𝑣(𝑥, 𝑦)
𝑌 = 𝑎̂ + 𝑏̂𝑥 = 𝑦 + 𝑏̂(𝑥 − 𝑥) = 𝑦 + (𝑥 − 𝑥) ⇒ 𝑏𝑦𝑥 =
𝑉(𝑥) 𝑉(𝑥)
𝑏𝑦𝑥 is called the regression coefficient of y on x

Properties of Regression equation and coefficient:

i) Mean of the observed values = Mean of predicted values ie ;

a) e = 0
ii) Variance of error = 𝑉𝑎𝑟(𝑦){1 − 𝑟 2 }
iii) 𝑉𝑎𝑟(𝑌) = 𝑟 2 𝑉𝑎𝑟(𝑦)
iv) regression equation intersect at the point (𝑥, 𝑦)
1−𝑟 2 𝑠𝑥 .𝑠𝑦
v) the acute angle 𝜃 between 2 regression lines is given by tan 𝜃 = |𝑟|
×
𝑉(𝑥)+𝑉(𝑦)
vi) Regression coefficient independent of change of origin but depends upon change of scale.
vii) |𝑟| = G.M of 2 regression coefficient
viii) 𝑟 , 𝑏𝑦𝑥 , 𝑏𝑦𝑥 be always have same sign
ix) 𝐴. 𝑀 of the absolute value of the regression coefficient cant less than absolute value of r.
x) 𝑐𝑜𝑣(𝑥 , 𝑒) = 0
xi) 𝑐𝑜𝑣( 𝑌 , 𝑒) = 0
xii) 𝑐𝑜𝑣( 𝑦 , 𝑒) = 𝑉(𝑒)
xiii) r b/w observed value and predicted value always positive.

Proof:

i) Regression equation of y on x
1 1
𝑌 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥) ⇒ ∑ 𝑌𝑖 = 𝑦 + 𝑏𝑦𝑥 ∑(𝑥𝑖 − 𝑥)
𝑛 𝑛
𝑦= 𝑌
1 1 1
a) 𝑒 = ∑ 𝑒𝑖 ⇒ ∑ 𝑦𝑖 − ∑ 𝑌𝑖 ⇒ 𝑦 − 𝑌 = 0
𝑛 𝑛 𝑛
1 1 2
ii) 𝑉(𝑒) = ∑ 𝑒𝑖2 [ ∴ 𝑒 = 0] ⇒ ∑ (𝑦𝑖 − 𝑦 − 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥))
𝑛 𝑛
𝑉(𝑦) + 𝑏𝑦2𝑥 𝑉(𝑥) − 2𝑏𝑦𝑥 𝑐𝑜𝑣(𝑥, 𝑦) ⇒ 𝑉(𝑦) + 𝑟 2 𝑉(𝑦) − 2𝑟 2 𝑉(𝑦) ⇒ 𝑉(𝑥)(1 − 𝑟 2 )
𝑐𝑜𝑣(𝑥,𝑦) 𝑠𝑦
∴ 𝑏𝑦𝑥 = =𝑟∙ 𝑉(𝑒) ≥ 0 ⇒ 𝑟 2 ≤ 1 ⇒ −1 ≤ 𝑟 ≤ 1
𝑉(𝑥) 𝑠𝑥
1 1 2
iii) 𝑌𝑖 = 𝑦 + 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥) ⇒ ∑(𝑌𝑖 − 𝑦)2 ⇒ 𝑏𝑦2𝑥 ∑(𝑥𝑖 − 𝑥) ⇒ 𝑟 2 𝑉(𝑦)
𝑛 𝑛
iv) 𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥) ⇒ (𝑥𝑖 − 𝑥) = 𝑏𝑥𝑦 𝑏𝑦𝑥 (𝑥𝑖 − 𝑥) ⇒ 𝑥 = 𝑥
1 𝑚1 −𝑚2 𝑏𝑦𝑥 𝑏𝑥𝑦 −1 𝑏𝑥𝑦
v) 𝑚1 = 𝑏𝑦𝑥 , 𝑚2 = ⇒ tan 𝜃 = | |⇒| ×( )| ⇒
𝑏𝑥𝑦 1+𝑚1 𝑚2 𝑏𝑥𝑦 𝑏𝑥𝑦 +𝑏𝑦𝑥
1−𝑟 2 𝑠𝑥 𝑠𝑦
|𝑟|
×
𝑉(𝑥)+𝑉(𝑦)

Special case:

When 𝑟 = 0 , ⇒ tan 𝜃 = ∞ , 𝜃 = 90°

𝑟 = ±1 , ⇒ 𝑡𝑎𝑛𝜃 = 0 ⇒ 𝜃 = 0° , 180°
As they are intersecting line when 𝑟 = 0 then they must coinside.
𝑥−𝑎 𝑦−𝑐
vi) Suppose 𝑢 = ; 𝑣=
𝑏 𝑑
𝑐𝑜𝑣(𝑥, 𝑦) 𝑏𝑑 𝑐𝑜𝑣(𝑢, 𝑣) 𝑏
= 2 𝑏𝑥𝑦 =
= 𝑏𝑢𝑣
𝑉(𝑦) 𝑑 𝑉(𝑣) 𝑑
Its independent upon the change of the origin but depend upon change of scale.
vii) √𝑏𝑦𝑥 𝑏𝑥𝑦 = √𝑟 2 = |𝑟|
𝑟𝑠𝑦 𝑟𝑠𝑥
viii) 𝑏𝑦𝑥 = ; 𝑏𝑥𝑦 = ⇒ its obvious that sign depends on the sign of r
𝑠𝑥 𝑠𝑦
|𝑏𝑦𝑥 +𝑏𝑥𝑦 | |𝑟| 𝑉(𝑥)+𝑉(𝑦)
ix) = ≥ |𝑟|
2 2 𝑠𝑥 𝑠𝑦

x) .
1 1 1
xi) 𝐶𝑜𝑣(𝑥, 𝑒) = ∑ 𝑥𝑖 𝑒𝑖 − 𝑥𝑒 ⇒ ∑ 𝑥𝑖 𝑒𝑖 ⇒ ∑ 𝑥𝑖 (𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 ) = 0 ( 2nd normal
𝑛 𝑛 𝑛
equation)
1 1 1
xii) 𝐶𝑜𝑣(𝑌, 𝑒) = ∑ 𝑌𝑖 𝑒𝑖 = ∑(𝑎 + 𝑏𝑥𝑖 )𝑒𝑖 = 0 + 𝑏 ∑ 𝑥𝑖 𝑒𝑖 = 0
𝑛 𝑛 𝑛
1 1 1
xiii) 𝐶𝑜𝑣(𝑦, 𝑒) = ∑ 𝑦𝑖 𝑒𝑖 = ∑(𝑒𝑖 + 𝑌𝑖 )𝑒𝑖 = ∑ 𝑒𝑖2 = 𝑉(𝑒)
𝑛 𝑛 𝑛
𝐶𝑜𝑣(𝑌,𝑦) 𝑏𝐶𝑜𝑣(𝑥,𝑦) 𝑏
xiv) 𝑟𝑌𝑦 = = |𝑏|𝑠𝑥 𝑠𝑦
= |𝑏| 𝑟𝑥𝑦
𝑠𝑦 𝑠𝑌

Explained and unexplained variation:

𝑦 = 𝑌 + 𝑒 ⇒ 𝑉(𝑦) = 𝑉(𝑌) + 𝑉(𝑒) [ ∴ 𝐶𝑜𝑣(𝑌, 𝑒) = 0]

Total variation = Explained variation + unexplained variation

2 2
(𝑌 −𝑌) ∑ 𝑒𝑖2 (𝑌 −𝑌)
⇒ 1 = ∑ ∑(𝑦𝑖 + ∑(𝑦 ⇒ ∑ ∑(𝑦𝑖 = proportion of total variation explained by regression line.
𝑖 −𝑦)2 𝑖 −𝑦)2 𝑖 −𝑦)
2

This is also known as coefficient of determination and denoted by 𝑅2

2
2
(𝑌𝑖 − 𝑌) 2
(𝑥𝑖 − 𝑥)2 𝑉(𝑦) 𝑉(𝑥)
𝑅 = ∑ 2
= 𝑏 ∑ = 𝑟2 = 𝑟2
∑(𝑦𝑖 − 𝑦) ∑(𝑦𝑖 − 𝑦) 𝑉(𝑥) 𝑉(𝑦)

1 − 𝑅2 = 𝐾 2 → known as the coefficient of non determination.

Rank Correlation:

What do you mean by ranking?

An ordered arrangement of individuals according to their degree of the position of a characteristics under
study is called ranking.

What do you mean by rank?

The ordinal no. Given to an individuals in a ranking is called rank.

What do you mean by rank “r” of an individual’s?

By rank ‘r’ of an individual’s we mean there are ‘r-1’ individuals. Who poses the characteristics under study in a
higher degree than the individuals.

Rank Correlation:

The association b/w 2 series of rank is known as rank correlation. Its measured by spearman rank correlation
coefficient.

Spearman Rank correlation coefficient is nothing but pearsons product moment correlation coefficient b/w
two series of ranks. Spearman rank correlation coefficient is given by

6 ∑ 𝑑𝑖2
𝑟𝑅 𝑜𝑟 𝑟𝑆 = 1 −
𝑛(𝑛2 − 1)

Where 𝑑𝑖 = 𝑢𝑖 − 𝑣𝑖 ; difference b/w two sets of ranks.

Derivation of Spearman Rank Correlation Coefficient formula ( in case of no tie):

Suppose we have n pairs of rank. (𝑢1 , 𝑣1 ), (𝑢2 , 𝑣2 ), … (𝑢𝑛 , 𝑣𝑛 )....for n individuals

𝐶𝑜𝑣(𝑢, 𝑣)
𝑟𝑅 =
𝑠𝑢 𝑠𝑣

1 2 𝑛(𝑛 + 1)(2𝑛 + 1) (𝑛 + 1)2 𝑛2 − 1

𝑉(𝑢) = ∑ 𝑢𝑖2 − 𝑢 = − =
𝑛 6𝑛 4 12
𝑛2 −1
Similarly 𝑉(𝑣) =
12

Suppose 𝑑𝑖 = 𝑢𝑖 − 𝑣𝑖 ⇒ 𝑑 = 𝑢 − 𝑣 ⇒ 𝑑𝑖 − 𝑑 = (𝑢𝑖 − 𝑢) − (𝑣𝑖 − 𝑣) ⇒

𝑑𝑖2 = (𝑢𝑖 − 𝑢)2 + (𝑣𝑖 − 𝑣)2 − 2(𝑢𝑖 − 𝑢)(𝑣𝑖 − 𝑣) [∴ 𝑑 = 𝑢 − 𝑣 = 0]

1
∑ 𝑑𝑖2 = 𝑉(𝑢) + 𝑉(𝑣) − 2𝐶𝑜𝑣(𝑢, 𝑣)
𝑛

1
𝐶𝑜𝑣(𝑢, 𝑣) = 𝑉(𝑢) − ∑ 𝑑𝑖2
2𝑛

𝐶𝑜𝑣(𝑢, 𝑣) ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2

𝑟𝑅 = =1− = 1−
𝑠𝑢 𝑠𝑣 2𝑛 𝑉(𝑢) 𝑛(𝑛2 − 1)
Prove that −𝟏 ≤ 𝒓 ≤ 𝟏

Suppose we have n pair of rank (𝑢1 , 𝑣1 ), (𝑢2 , 𝑣2 ), … (𝑢𝑛 , 𝑣𝑛 )....

6 ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2

Now 𝑑𝑖2 ≥ 0 ⇒ ≥0 ⇒ − ≤ 0 ⇒ 1− ≤ 1 ⇒ 𝑟𝑅 ≤ 1 ................................ (1)
𝑛(𝑛2 −1) 𝑛(𝑛2 −1) 𝑛(𝑛2 −1)

Again 𝑑 = 𝑢 − 𝑣 Let 𝑑 ′ = 𝑢 + 𝑣 𝑉(𝑑) + 𝑉(𝑑 ′ ) = 2(𝑉(𝑢) + 𝑉(𝑣)) ⇒ 𝑉(𝑑) ≤ 4 𝑉(𝑢)

1 4(𝑛2 −1) 6 ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2

∑ 𝑑𝑖2 ≤ ⇒ ≤2⇒1− ≥ −1 ⇒ 𝑟𝑅 ≥ −1 ...................................................... (2)
𝑛 12 𝑛(𝑛2 −1) 𝑛(𝑛2 −1)

From (1) & (2) we get −𝟏 ≤ 𝒓 ≤ 𝟏

Discuss the cases of equality:

i) Equality holds in (1) ie; 𝑟𝑅 = 1 ⇒ 𝑑𝑖2 = 0 ⇒ 𝑢𝑖 = 𝑣𝑖

That is when there exists a perfect agreement b/w two set of ranks

ii) Equality holds in (2) ie; 𝑟𝑅 = −1 ⇒ 𝑉(𝑑 ′ ) = 0 ⇒ 𝑑 ′ = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 ⇒ 𝑢𝑖 + 𝑣𝑖 = 𝑘

𝑛(𝑛 + 1)
𝑛𝑘 = × 2 = 𝑛(𝑛 + 1) ⇒ 𝑢𝑖 = (𝑛 + 1) − 𝑣𝑖
2
Such that if there exists a perfect disagreement b/w two sets of rank.

Residual variance:
Error variance is called residual variance

−×−

SOME PROBLEMS ON BIVARIATE ANALYSIS WITH THERE SOLUTIONS (SYMBOLS HAVE THEIR USUAL MEANINGS):

1) Two variants have the least square regression lines 𝒙 + 𝟒𝒚 + 𝟑 = 𝟎 and 𝟒𝒙 + 𝟗𝒚 + 𝟓 = 𝟎 . Find
their mean values and r and 𝑽(𝒙): 𝑽(𝒚) =?
Ans.
3 1 1
𝑥 + 4𝑦 = −3 ⇒ 𝑦 = − − 𝑥 ⇒ 𝑏𝑦𝑥 = −
4 4 4
5 9 9 9 3
& 𝑥 = − − 𝑦 ⇒ 𝑏𝑥𝑦 = − ∴ 𝑟 2 = 𝑏𝑦𝑥 . 𝑏𝑥𝑦 = ⇒𝑟=−
4 4 4 16 4

𝑏𝑦𝑥 𝑉(𝑥)
= = 9: 1
𝑏𝑥𝑦 𝑉(𝑦)

Regression equation intersect at (𝑥, 𝑦) solving these two equation we get 𝑥 = 1 & 𝑦 = −1

2) Two positively correlated variables 𝒙 & 𝑦 have variances 𝑽(𝒙) & 𝑉(𝒚) respectively determine the
𝒔
value of constant ‘a’ such that 𝒙 + 𝒂𝒚 & 𝒙 + ( 𝒙 ) 𝒚 are uncorrelated.
𝒔𝒚

Ans.
For uncorrelated variables we know that
𝑠 1 𝑠
𝐶𝑜𝑣(𝑥 + 𝑎𝑦, 𝑥 + ( 𝑥 )𝑦) = 0 ⇒ ∑{(𝑥 − 𝑥) + 𝑎(𝑦 − 𝑦)}{(𝑥 − 𝑥) + ( 𝑥 ) (𝑦 − 𝑦)} = 0
𝑠𝑦 𝑛 𝑠𝑦
𝑠𝑥 𝑠𝑥
⇒ 𝑉(𝑥) + ( ) 𝐶𝑜𝑣(𝑥, 𝑦) + 𝑎 𝐶𝑜𝑣(𝑥, 𝑦) + 𝑎 ( ) 𝑉(𝑦) = 0
𝑠𝑦 𝑠𝑦
𝑠𝑥
⇒ 𝑎 (𝑠𝑥 𝑠𝑦 + 𝐶𝑜𝑣(𝑥, 𝑦)) + (𝑠𝑥 𝑠𝑦 + 𝐶𝑜𝑣(𝑥, 𝑦)) = 0
𝑠𝑦
𝑠𝑥
⇒ 𝑎 = − ( ) [ ∴ (𝑠𝑥 𝑠𝑦 + 𝐶𝑜𝑣(𝑥, 𝑦)) ≠ 0]
𝑠𝑦

3) Derive 𝒃𝒚𝒙 from following pairs (−𝟐, 𝟏𝟏), (−𝟏, 𝟕), (𝟎, 𝟓), (𝟏, 𝟗), (𝟐, 𝟏𝟐)
Ans.
𝒙𝒊 𝒚𝒊 𝒙𝒊 𝒚 𝒊 𝒙𝟐𝒊 𝒚𝟐𝒊
−2 11 −22 4 121
−1 7 −7 1 49
0 5 0 0 25
1 9 9 1 81
2 12 24 4 144
0 4 10 420

4
𝐶𝑜𝑣(𝑥, 𝑦)
𝑏𝑦𝑥 = = 5 = 0.4
𝑉(𝑥) 10
5
⇒ 𝑦 = 8.8 + 0.4𝑥
4) Give an example where 𝒓𝒙𝒚 = 𝟎 but their exists an exact relationship b/w x and y.

Ans. Let us consider the 5 pairs of observations as follow (−2,4), (−1,1), (0,0), (1,1), (2,4)

We have to find 𝑟𝑥𝑦 ; consider the following table

𝒙𝒊 𝒚𝒊 𝒙𝟐𝒊 𝒚𝟐𝒊 𝒙𝒊 𝒚 𝒊
−2 4 4 16 −8
−1 1 1 1 −1
0 0 0 0 0
1 1 1 1 1
2 4 4 16 8

1
𝐶𝑜𝑣(𝑥, 𝑦) =
∑ 𝑥𝑖 𝑦𝑖 − 𝑥𝑦 ⇒ 0 ⇒ 𝑟𝑥𝑦 = 0
𝑛
But by observations we see that x & y in a relation with 𝑦 = 𝑥 2
5) If 𝒚 = 𝟑𝒙 + 𝟓 then find 𝒓𝒙𝒚 =?

Ans. 𝑦𝑖 = 3𝑥𝑖 + 5 ⇒ 𝑦 = 3𝑥 + 5 ⇒ (𝑦𝑖 − 𝑦) = 3(𝑥𝑖 − 𝑥) Now we have to find 𝑟𝑥𝑦

3
𝐶𝑜𝑣(𝑥,𝑦) ∑(𝑥𝑖 −𝑥)(𝑥𝑖 −𝑥)
𝑟𝑥𝑦 = =𝑛 =1
𝑠𝑥 𝑠𝑦 𝑠𝑥 3𝑠𝑥
1 9
[∴ ∑(𝑦𝑖 − 𝑦)2 = ∑(𝑥𝑖 − 𝑥)2 ]
𝑛 𝑛
6) Why do we use two regression lines?

Ans. When there is a reasonable amount of scatter, we can draw two different regression lines
depending upon which variable we considered to be the most accurate;
1) The first line is the regression of y (dependent variable) on x (independent variable) ; which
can be used to estimate the dependent variable y , given to the condition of x
2) The other line is the regression of x on y ; which can be used to estimate x , given to the
condition y

Note: if there exists a Perfects correlation (both positive and negative) then two regression lines
will be same.

Linear Regression Assignment
0% (2)
Linear Regression Assignment
8 pages
Reclaiming Conversation
100% (1)
Reclaiming Conversation
223 pages
Econometrics by Example 2nd Edition Gujarati Solutions Manual PDF
No ratings yet
Econometrics by Example 2nd Edition Gujarati Solutions Manual PDF
8 pages
Lecture 1
No ratings yet
Lecture 1
44 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Correlation
0% (1)
Correlation
22 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages
Correction
No ratings yet
Correction
10 pages
Correlation and Regression
No ratings yet
Correlation and Regression
16 pages
2. Correlation, Regression & Curve Fitting
No ratings yet
2. Correlation, Regression & Curve Fitting
6 pages
Bivariate Statistical
No ratings yet
Bivariate Statistical
51 pages
ρ r r Cov (X,Y) V (x) .V (Y) X X) (Y Y) X X n Y Y: xy xy
No ratings yet
ρ r r Cov (X,Y) V (x) .V (Y) X X) (Y Y) X X n Y Y: xy xy
13 pages
PSNM - Ch. 1
No ratings yet
PSNM - Ch. 1
16 pages
REGRESSION ANALYSIS
No ratings yet
REGRESSION ANALYSIS
6 pages
File Gabungan
No ratings yet
File Gabungan
107 pages
Corelation and Reg.-12-27
No ratings yet
Corelation and Reg.-12-27
16 pages
Module 3 (Regression Line) and Module 4
No ratings yet
Module 3 (Regression Line) and Module 4
38 pages
Probability Distributions and Curve Fitting
No ratings yet
Probability Distributions and Curve Fitting
53 pages
Regression Analysis Material
No ratings yet
Regression Analysis Material
12 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
SM 38
No ratings yet
SM 38
28 pages
Lecture 9 simple-linear-regression-correlation updated
No ratings yet
Lecture 9 simple-linear-regression-correlation updated
44 pages
STAT1
No ratings yet
STAT1
17 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
Chapter_10.QM sir pac
No ratings yet
Chapter_10.QM sir pac
8 pages
Correlation & Regression (Complete) .PDF Theory Module-6-B
100% (1)
Correlation & Regression (Complete) .PDF Theory Module-6-B
9 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
Correlation
No ratings yet
Correlation
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
42 pages
2.correlation Regression Summary Notes by Pranav Popat 1
No ratings yet
2.correlation Regression Summary Notes by Pranav Popat 1
4 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
List of Formula For Unit-3 and Unit-4
No ratings yet
List of Formula For Unit-3 and Unit-4
7 pages
Correlation Ansd Simple Regression
No ratings yet
Correlation Ansd Simple Regression
27 pages
Unit Iv-1
No ratings yet
Unit Iv-1
15 pages
STB1003_Unit-3 bsc
No ratings yet
STB1003_Unit-3 bsc
12 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Regression Analysis
No ratings yet
Regression Analysis
47 pages
Unit 2-Part 3-Linear Regression
No ratings yet
Unit 2-Part 3-Linear Regression
38 pages
Isc12 Correlation and Regression PDF
No ratings yet
Isc12 Correlation and Regression PDF
11 pages
Covariances
No ratings yet
Covariances
12 pages
REGRESSION AND CORRELATION (formulas)
No ratings yet
REGRESSION AND CORRELATION (formulas)
4 pages
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
No ratings yet
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
14 pages
Simple Regression Model: Conference Paper
No ratings yet
Simple Regression Model: Conference Paper
10 pages
SCM Session 6 Correlation and Regression Analysis
No ratings yet
SCM Session 6 Correlation and Regression Analysis
63 pages
Correlation Regression Bivariate
No ratings yet
Correlation Regression Bivariate
12 pages
Lecture 3.1.9 (REGRESSION)
No ratings yet
Lecture 3.1.9 (REGRESSION)
9 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
CH 6
No ratings yet
CH 6
43 pages
Bivariate EDA and Regression Analysis
No ratings yet
Bivariate EDA and Regression Analysis
61 pages
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
10 pages
Sec D CH 12 Regression Part 2
100% (1)
Sec D CH 12 Regression Part 2
66 pages
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
No ratings yet
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
8 pages
CH 6
No ratings yet
CH 6
42 pages
Title: Regression and Correlation: Mathematics Support Centre
No ratings yet
Title: Regression and Correlation: Mathematics Support Centre
2 pages
Correlation (Linear Dependence) Linear Regression (Simple and Multiple)
No ratings yet
Correlation (Linear Dependence) Linear Regression (Simple and Multiple)
35 pages
A Notes Sheet For Econometrics
No ratings yet
A Notes Sheet For Econometrics
1 page
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
MS-8 2
No ratings yet
MS-8 2
3 pages
Occupational Schools Regular Schools
No ratings yet
Occupational Schools Regular Schools
53 pages
Chapter 3 Notes-Alyssa
No ratings yet
Chapter 3 Notes-Alyssa
10 pages
BPP Business School - Applied Modelling and Visualisation
No ratings yet
BPP Business School - Applied Modelling and Visualisation
19 pages
How To Perform Simple Linear Regression in SPSS
No ratings yet
How To Perform Simple Linear Regression in SPSS
8 pages
Correlation and Simple Linear Regression (Problems With Solutions)
100% (3)
Correlation and Simple Linear Regression (Problems With Solutions)
34 pages
ST304 Notes Zetai (v2)
No ratings yet
ST304 Notes Zetai (v2)
20 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
LR-Heteroskedastisitas Test-Log10 Method
No ratings yet
LR-Heteroskedastisitas Test-Log10 Method
4 pages
(Ebook) Real Stats: Using Econometrics for Political Science and Public Policy by Bailey, Michael A. ISBN 9780199981946, 0199981949 pdf download
No ratings yet
(Ebook) Real Stats: Using Econometrics for Political Science and Public Policy by Bailey, Michael A. ISBN 9780199981946, 0199981949 pdf download
48 pages
Practice Questions
No ratings yet
Practice Questions
8 pages
Quantitative Techniques, ASSIGNMENTS-2
No ratings yet
Quantitative Techniques, ASSIGNMENTS-2
7 pages
Download ebooks file (Ebook) Applying Regression and Correlation : A Guide for Students and Researchers by Jeremy Miles, Mark Shevlin ISBN 9780761962298, 9781473913998, 9781446232897, 0761962298, 1473913993, 1446232891 all chapters
100% (4)
Download ebooks file (Ebook) Applying Regression and Correlation : A Guide for Students and Researchers by Jeremy Miles, Mark Shevlin ISBN 9780761962298, 9781473913998, 9781446232897, 0761962298, 1473913993, 1446232891 all chapters
81 pages
Stat A Cheat Sheets
No ratings yet
Stat A Cheat Sheets
6 pages
4405 11042 1 SM
No ratings yet
4405 11042 1 SM
12 pages
Hubungan Antara Iklim Kelas Dan Kemandirian Belajar Terhadap Hasil Belajar Matematika
No ratings yet
Hubungan Antara Iklim Kelas Dan Kemandirian Belajar Terhadap Hasil Belajar Matematika
8 pages
Tetrachoric Correlation: Statdata Page Sas Programs
No ratings yet
Tetrachoric Correlation: Statdata Page Sas Programs
1 page
BSC 311: Design and Analysis of Experiments ANOVA, Multiple Comparisons, Contrasts and Kruskal Wallis Test
No ratings yet
BSC 311: Design and Analysis of Experiments ANOVA, Multiple Comparisons, Contrasts and Kruskal Wallis Test
3 pages
Output Data Baru Ordinal
No ratings yet
Output Data Baru Ordinal
4 pages
Chapter 4 - Dimension Reduction: Data Mining For Business Intelligence
No ratings yet
Chapter 4 - Dimension Reduction: Data Mining For Business Intelligence
24 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
Gabriel Otieno Okello - Statistical Methods Using SPSS-Chapman and Hall_CRC (2024)
No ratings yet
Gabriel Otieno Okello - Statistical Methods Using SPSS-Chapman and Hall_CRC (2024)
204 pages
Handout 3 Non Stationarity
No ratings yet
Handout 3 Non Stationarity
27 pages
GEEORD
No ratings yet
GEEORD
53 pages
ANOVA
No ratings yet
ANOVA
12 pages
Lampiran 24 Hasil Analisis Data Uji Post Hoc (Uji Tukey HSD Dan Uji LSD)
No ratings yet
Lampiran 24 Hasil Analisis Data Uji Post Hoc (Uji Tukey HSD Dan Uji LSD)
6 pages
Econometric Analysis 8th Edition Greene Solutions Manual instant download
100% (1)
Econometric Analysis 8th Edition Greene Solutions Manual instant download
47 pages
Introduction To Blocking: Nuisance Factor: A Factor That Probably Has An Effect On The Response, But Is Not A Factor
No ratings yet
Introduction To Blocking: Nuisance Factor: A Factor That Probably Has An Effect On The Response, But Is Not A Factor
4 pages

Bivariate Analysis

Uploaded by

Bivariate Analysis

Uploaded by

2

Example: i) Heights (x) and Weights (y) of the students.

ii) Age (x) and Blood pressure (y) of a group of individuals.

iii) Marks in test (x) Marks in final exam (y) of students.

iv) income and expenditure of a number of family etc.

What do you mean by Correalation:

By co-relation we mean association or independence b/w two variables.

If a variable is found to increase (decrease) on an average with the increase (decrease) of

If a variable is found to decrease (increase) on an average with the increase (decrease) of

𝑦1 − 𝑦1′ 𝑦2 − 𝑦2′ .......... 𝑦𝑙 − 𝑦𝑙′ Total

𝑛 = ∑ ∑ 𝑓𝑖𝑗 = ∑ 𝑓𝑖0 = ∑ 𝑓0𝑗 = 𝑛

Pearson’s product moment correlation coefficient:

Working Formula for 𝒓:

Some results on covariants:

Some results on variance:

i) 𝑉(𝑎𝑥 + 𝑏𝑦 + 𝑐) = 𝑎2 𝑉(𝑥) + 𝑏 2 𝑉(𝑦) + 2𝑎𝑏 𝑐𝑜𝑣(𝑥, 𝑦)

∴ 𝑎(𝑥𝑖 − 𝑥) + 𝑏(𝑦𝑖 − 𝑦) = (𝑢𝑖 − 𝑢)

Properties of correlation coefficient:

i) Its a pure number or unit free number.

iii) Suppose we have n pairs of observations (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )

iv) Suppose we have n pairs of observation (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )

What do you mean by regression ?

Normal equation and derivations of regression lines:

Hence the regression equation of y on x is

Properties of Regression equation and coefficient:

i) Mean of the observed values = Mean of predicted values ie ;

When 𝑟 = 0 , ⇒ tan 𝜃 = ∞ , 𝜃 = 90°

Explained and unexplained variation:

𝑦 = 𝑌 + 𝑒 ⇒ 𝑉(𝑦) = 𝑉(𝑌) + 𝑉(𝑒) [ ∴ 𝐶𝑜𝑣(𝑌, 𝑒) = 0]

Total variation = Explained variation + unexplained variation

This is also known as coefficient of determination and denoted by 𝑅2

1 − 𝑅2 = 𝐾 2 → known as the coefficient of non determination.

What do you mean by ranking?

What do you mean by rank?

The ordinal no. Given to an individuals in a ranking is called rank.

What do you mean by rank “r” of an individual’s?

Where 𝑑𝑖 = 𝑢𝑖 − 𝑣𝑖 ; difference b/w two sets of ranks.

Derivation of Spearman Rank Correlation Coefficient formula ( in case of no tie):

Suppose we have n pairs of rank. (𝑢1 , 𝑣1 ), (𝑢2 , 𝑣2 ), … (𝑢𝑛 , 𝑣𝑛 )....for n individuals

1 2 𝑛(𝑛 + 1)(2𝑛 + 1) (𝑛 + 1)2 𝑛2 − 1

Suppose 𝑑𝑖 = 𝑢𝑖 − 𝑣𝑖 ⇒ 𝑑 = 𝑢 − 𝑣 ⇒ 𝑑𝑖 − 𝑑 = (𝑢𝑖 − 𝑢) − (𝑣𝑖 − 𝑣) ⇒

𝑑𝑖2 = (𝑢𝑖 − 𝑢)2 + (𝑣𝑖 − 𝑣)2 − 2(𝑢𝑖 − 𝑢)(𝑣𝑖 − 𝑣) [∴ 𝑑 = 𝑢 − 𝑣 = 0]

𝐶𝑜𝑣(𝑢, 𝑣) ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2

Suppose we have n pair of rank (𝑢1 , 𝑣1 ), (𝑢2 , 𝑣2 ), … (𝑢𝑛 , 𝑣𝑛 )....

6 ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2

Again 𝑑 = 𝑢 − 𝑣 Let 𝑑 ′ = 𝑢 + 𝑣 𝑉(𝑑) + 𝑉(𝑑 ′ ) = 2(𝑉(𝑢) + 𝑉(𝑣)) ⇒ 𝑉(𝑑) ≤ 4 𝑉(𝑢)

1 4(𝑛2 −1) 6 ∑ 𝑑𝑖2 6 ∑ 𝑑𝑖2

From (1) & (2) we get −𝟏 ≤ 𝒓 ≤ 𝟏

Discuss the cases of equality:

i) Equality holds in (1) ie; 𝑟𝑅 = 1 ⇒ 𝑑𝑖2 = 0 ⇒ 𝑢𝑖 = 𝑣𝑖

ii) Equality holds in (2) ie; 𝑟𝑅 = −1 ⇒ 𝑉(𝑑 ′ ) = 0 ⇒ 𝑑 ′ = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 ⇒ 𝑢𝑖 + 𝑣𝑖 = 𝑘

We have to find 𝑟𝑥𝑦 ; consider the following table

Ans. 𝑦𝑖 = 3𝑥𝑖 + 5 ⇒ 𝑦 = 3𝑥 + 5 ⇒ (𝑦𝑖 − 𝑦) = 3(𝑥𝑖 − 𝑥) Now we have to find 𝑟𝑥𝑦

You might also like