0% found this document useful (0 votes)
76 views

Joint Distributions: A Random Variable Is That Maps To Numbers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Joint Distributions: A Random Variable Is That Maps To Numbers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

LECTURE NOTES NO.

5 M235
Chapter 5

Joint Distributions
The Joint Distribution of Two Discrete
Random Variables:
Example: A experiment consists of three
tosses of a fair coin. The sample space
S = {HHH, HHT, HTH, HTT, THH, THT,
TTH, TTT}
Let the random variable X be the number of
heads.
A random variable is A function that
maps events to numbers.
X:S (the sample space)→R(the set of real numbers)
X(HHH)=3,X(HHT)=2,X(HTH)=2,

1
X(HTT)=1,X(THH)=2,X(THT)=1
X(TTT)=0,X(TTH)=1
Then the possible values of X are 0,1,2, 3
The probability distribution of X is given by
Table 1
x 0 1 2 3
P(X=x 1/8 3/8 3/ 1/8
) 8
Let the random variable Y be the number of
tails that precede the first head. (if no heads
come up, we let Y=3),
A random variable is A function that
maps events to numbers.
Y(HHH)=0,Y(HHT)=0,Y(HTH)=0,
Y(HTT)=0,Y(THH)=1,Y(THT)=1
Y(TTT)=3,Y(TTH)=2
2
then the possible values of Y are 0,1,2, and 3
The probability distribution of Y is given by
Table 2
y 0 1 2 3
P(Y=y 4/8 2/8 1/ 1/8
) 8

Look to events involving both X and Y, for


example P(X=1 and Y = 1)?

P(X=1 and Y = 1) = P(THT) = 1/8,


because the event “X=1 and Y=1” occurs
only if the outcome of the experiment is
THT. But notice that this answer , 1/8,
cannot be found from the tables 1 and 2.
Notation:
P(X = x and Y = y)= P(X = x, Y = y)
3
The tables 1 and 2 are fine for computing
probabilities for X and Y alone, but from
these tables, there is no way to know that
P(X=1 and Y = 1)=1/8.
What do we need?
It seems clear that in this example, the
probabilities in the following table will give
us enough information:
Table 3

y
p(x, y) 0 1 2 3
0 0 0 0 1/8
x 1 1/8 1/8 1/8 0
2 2/8 1/8 0 0
3 1/8 0 0 0
The
notation “p(x, y)” refers to P(X = x and Y =

4
y); for example, p(2, 0) = P(X=2 and
Y=0)= 2/8. Notice that the 16 probabilities
in the table add to 1.
The entries in table 3 amount to the
specification of a function of x and y, p(x,
y)= P(X = x and Y = y). It is called the joint
probability mass function of X and Y.
Table 4
y
p(x, y) 0 1 2 3
0 0 0 0
1/ 1/8
P(X=x)
8
x
1 1/8 1/8 1/8 0 3/8
2 2/8 1/8 0 0 3/8
3 1/8 0 0 0 1/8
4/8 2/8 1/8 1/
8
P(Y=y)

5
Notice from Table 4, we can get tables 1 and 2 – The
individual mass functions of X and Y – by adding
across the rows and down the columns. In this
context the probability mass function of X and Y are
called marginal probability mass functions and
denoted by pX(x) and pY(y).
Definition. The joint probability mass
function of two discrete random variables X
and Y is the function p(x, y), defined for all
pairs of real numbers x and y by
P(x, y)=P(X = x and Y = y).
and satisfies the following two conditions:

1. P(x, y )≥0, for all x and y


2. ∑ ∑ P (x, y )==1
x y .

3. P ( a≤X ≤b,c≤ X≤d )= ∑ ∑ P( x , y )


c≤ x≤d a≤xy≤b

6
Theorem: The marginal probability mass
functions can be obtained from the joint
probability mass function:
P( X= x)=∑ P( X =x ,Y = y)
y

P(Y = y )=∑ P( X =x, Y = y)


x

Independence of random variables :


Recall: events A and B are
independent  P(A  B) = P(A) P(B)
Two r.v.`s X and Y will be called
independent when every event
concerning X and every event concerning
Y are indpendent.
General definition :
X and Y are independent 
P(X A and YB) = P(XA) P(YB) ,
for any combination A R and B R

7
For discrete variables this is equivalent
to:
X and Y are independent 
P(X = x, Y = y) = P(X = x) P(Y = y)
for all x and y
X and Y are dependent if they are not
independent
Theorem:
E {g ( X , Y ) }=∑ ∑ g( x , y )P ( X=x , Y = y )
x y

e.g.
E( XY )=∑ ∑ xyP( X =x∧Y = y)
x y

It follows that: E(X+Y) = E(X) + E(Y)

8
Definition: The conditional probability mass
function of X, given Y=y denoted by PX(x|y) is
defined by
p( X=x ,Y =y)
p X ( x| y)=P( X=x|Y =y)=
P(Y= y )

for fixed y. Provided P(Y=y)>0

The conditional expectation is:


E( X|Y = y )=∑ xP ( X=x|Y = y )
x

Example: Consider the following joint

probability mass function


y
p(x, y) 0 1 2 3
0 1/8 1/8 0 0
x 1 0 2/8 2/8 0
2 0 0 1/8 1/8

9
a. Find P(X+Y = 2) ?
=P(0,2)+P(1,1)+P(2,0)=0+2/8+0 = 2/8
b. Find P(X > Y)?
=P(1,0)+P(2,0)+P(2,1) = 0
c. Find marginal probability mass
function of X
3
P( X= x )= ∑ P( X=x ,Y = y )
y =0
3
P( X=0)= ∑ P( X =0 , Y = y )
y =0

1 1 2
= p(0,0 )+ p(0,1)+ p(0,2 )+ p (0,3)= + +0+0=
8 8 8
3
P( X=1)= ∑ P( X=1 ,Y = y )
y=0

10
2 2 4
= p(1,0 )+ p (1,1)+ p(1,2)+ p(1,3 )=0+ + +0=
8 8 8
3
P( X=2)= ∑ P( X=2, Y = y )
y=0

1 1 2
= p(2,0 )+p (2,1)+ p(2,2)+ p(2,3 )=0+0+ + =
8 8 8
d. Find the conditional p.m.f of X,
given Y=1?
p( X =x ,Y =1)
p X|Y ( X=x|Y =1)=
P(Y =1 )
But PY(1)=3/8. Therefore
p( X =x ,Y =1)
p X|Y ( X=x|Y =1)= , x=0,1,2
P(Y =1 )
Thus
p( X =0 ,Y =1) 1/ 8
p X|Y ( X=0|Y =1)= = =1/ 3
P(Y =1) 3/ 8
p( X=1 , Y =1) 2/ 8
p X|Y ( X=1|Y =1 )= = =2/3
P (Y =1) 3/ 8

11
p ( X=2 , Y =1 ) 0
p X|Y ( X=2|Y =1 )= = =0
P( Y =1) 3 /8
These probabilities can be represented in
the following table:
x 0 1 2
P(X=x| 1/ 2/ 0
Y=1) 3 3

e.Find P(0 ≤ X ≤ 1|Y=1) ?


P(0≤ X≤1|Y =1)= ∑ P( X=x|Y =1)
x=0,1

1 2
= p(0|1 )+ p(1|1)= + =1
3 3
f. Are X and Y independent?
X and Y are independent if
P(X=x, Y=y)= pX(X=x) . PY(Y=y)
for all x and y
Notice:
12
Check first if
p(0, 0) ≠ pX(0) . pY(0) ?
1/8 ≠ (2/8)(1/8)
Hence So, X and Y are not independent.
Joint Probability mass functions:
Let X 1 , …, X n be discrete random variables. The
joint probability distribution function is
P ( x1 ,…, x n ) =P ( X 1 =x 1 ,…, X n =x n )
,

where
x 1 ,…, x n are possible values of

X 1 , …, X n , respectively. P ( x1 ,…, x n)
satisfies the following conditions:

1.
P ( x1 ,…, x n ) ≥0
for all
x 1 ,…, x n ;

∑ ⋯∑ P ( x 1 ,…, x n) =1
x xn
2. 2 .

13
Independence of Multiple Random Variables:

Let X 1 ,…, X n be random variables. If X 1 ,…, X n


are independent if and only if
P ( X n =x 1 ,…, X n=x n )=P ( X n =x 1 ) ⋯P ( X n =x n )
,

for all pairs of ( x 1 ,…, x n )


Measure for dependency: covariance
Definition:
Suppose that X and Y are random
variables with joint p.m.f P(X=x,Y=y).
The covariance of X and Y is defined by
Cov(X,Y)=E([X−E(X)][Y−E(Y)]
For discrete X and Y:
COV ( X ,Y )=∑ ∑ ( x−E ( X ))( y−E(Y ))P ( X=x ,Y= y )
x y
Covariance is a measure of dependence
14
between X and Y, it is zero when they
are independent.
We say that X and Y are not correlated if
cov(X, Y) = 0
Result :
cov(X, Y) = E(XY) – EX×EY
Properties covariance:
1. cov(X, Y) = cov(Y, X)
2. cov(X, X) = var(X)
3. cov(X + a, Y) = cov(X, Y)
4. cov(aX, Y) = a cov(X, Y)
5. cov(X, Y + Z) = cov(X, Y) + cov(X, Z)
6. var(X+Y) = var(X)+var(Y)+2cov(X,Y)
Property 8 for X1, …, Xn  :

15
n n
var
( )
∑ X i =∑ var ( X i )+∑ cov ( X i , X j )
i=1 i=1 i≠ j

Properties for independent X and Y  :


1. E(XY) = EX×EY
2. cov(X, Y) = 0
3. var(X+Y) = var(X)+var(Y)
Note 1: In general: E(X+Y) = EX+EY ,
but E(XY) ≠ EX×EY
Note 2 : independence => no correlation
not vice versa !
The value of the covariance :
- is positive if large values of X in general
coïncide with large values of Y and v.v.

16
- dépends on the unit of measurement
The correlation coefficient ρ(X,Y) does
not depend on the unit of
measurement:
cov ( X ,Y )
ρ( X ,Y )=
σ X σY

Cov( X , Y ) Cov( X ,Y )
Corr( X ,Y )= =
√ Var( X ) √ Var(Y ) σ X σ Y
Properties ρ(X,Y):
1. -1 ≤ ρ(X,Y) ≤ 1
So “ρ(X,Y) is a standardized measure for
linear dependence of X and Y”

Example:

17
Suppose first that X and Y have the joint
p.m.f p(x,y) as given in the following
table
P(x,y y
) 1 2 3
0 0.02 0.05 0.1 0.22
x 5
1 0.08 0.24 0.0 0.37
5
2 0.23 0.15 0.0 0.41
3
0.33 0.44 0.2
3

E(X) =0(.22)+1(.37)+2(.41)=1.19

18
E(Y) =1(.33)+2(.44)+3(.23)=1.90
E(XY) = 0(1)(.02)+0(2)(.05)+0(3)
(.15)+1(1)(.08)+1(2)(.24)+1(3)
(.37)+2(1)(.23)+2(2)(.15)+2(3)(.03)=1.95
Consequently,
Cov(X,Y) =1.95-1.19(1.90)= -0.311
Joint distribution of two continuous
random variables
If there is a function fX,Y(x,y) so that for
every B ⊂ R2
P(( X ,Y )∈B)= ∬ f X ,Y ( x, y)dxdy
(x, y )∈B

Then (X,Y) are jointly continuous


random variables fX,Y(x,y) is called joint
probability density function

19
As X and Y are continuous, Probability of values of
(X, Y) in a rectangle:
d b
P ( a≤ X≤b , c≤Y ≤d )=∫∫ f ( x , y )dxdy
c a

Requirements for joint density function f(x,y):


1.f(x, y) ≥ 0, for all x and y
∞ ∞

2. ∫ ∫ f ( x , y ) dxdy =1
−∞ −∞

Notation: f(x, y) = fX,Y(x, y)

Example: Let X and Y have joint probability density


function:

f (x, y)=¿ {4 xy 0<x<1,0<y<1¿} ¿{}

20
Marginal probability density functions:
 Let X and Y be jointly continuous random
variables. The marginal probability density
function for X and Y are

f X ( x)= ∫ f X ,Y ( x , y )dy
−∞ ,
and

f Y ( y)= ∫ f X,Y ( x , y)dx
−∞
Conditional probability density function:
Let X and Y be continuous random variables. The
conditional probability density function of X given
21
Y is
f X|Y ( x , y )
f X |Y ( x| y )=
f Y ( y)
Where fY(y)>0
Independence of Two Random Variables:
Let X and Y be random variables. X and Y are
independent if and only if
f X , Y ( x , y )=f X ( x )f Y ( y ), for all x and y
Independence of continuous random variables
X and Y are independent if
f X , Y ( x , y )=f X ( x ) f Y ( y ) for all xR and yR

Example: Let X and Y have joint probability density


function:

f(x,y)=¿ {4 xy 0<x<1,0<y<1¿} ¿ {}
(a) Find P( X <1/ 4 , Y < 1/ 2)?
22
1 1
2 4
P ( X <1/ 4 , Y < 1/ 2)=∫ ∫ 4 xydxdy
0 0
1 1

P ( X <1/ 4 , Y < 1/ 2)=∫ 4 y


1
0
2

( )
4

∫ xdx
0
dy

2
x2 1 /4
P( X <1/ 4 , Y < 1/ 2)=∫ 4 y
0
2
|0 dy ( )
1
2
1
P ( X <1/ 4 , Y < 1/ 2)=∫ 4 y
0
( )
32
dy
1
2
1 1 y 2 1 /2 1
P( X <1/ 4 , Y < 1/ 2)= ∫ y dy = |0 =
8 0 8 2 64
(b) Find the marginal probability density function
for X and Y.
1 1
y2 1
f X ( x )=∫ 4 xydy=4 x ∫ ydy=4 x |0 =2 x 0<x <1
0 0 2
1 1
x2 1
f Y ( x )=∫ 4 xydx=4 y ∫ xdx =4 y |0 =2 y 0< y <1
0 0 2
(c) Find the conditional probability density function
for X given X and the conditional probability
1 2

23
density function for X given Y , f X |Y ( x|y ) ?
f X , Y ( x , y ) 4 xy
f X |Y ( x|y )= = =2 x , 0<x <1
f Y ( y) 2y
(d)
© Are X and Y independent?
check if f X ,Y ( x , y )=f X ( x )f Y ( y ) ?
4 xy =(2 x )(2 y )=4 xy

Then X and Y are independent


For the expectation we can derive:
∞ ∞
Eg ( X , Y )=∫ ∫ g ( x , y ) f ( x , y ) dxdy
−∞ −∞

In particular:
∞ ∞
E ( XY )= ∫ ∫ xyf ( x , y ) dxdy
−∞ −∞
,

Joint probability distribution of more than


two random variables
 Let X 1 , …, X n be continuous random variables.
The joint probability density function satisfies the
following conditions:
24
f ( x 1 ,…,x n )≥0
1. for −∞<x 1 ,…, x n <∞; ;
∞ ∞

2.
∫−∞ ⋯−∞∫ f ( x1 ,…,xn ) dx1⋯d xn=1 .
properties for two variables can be easily extended
to 3 or more variables. Some examples:
FX 1 ,... , Xn ( x 1 ,… , x n )=P( X 1 ≤ x 1 , … , X n ≤ x n )
∞ ∞
f X ( x 1) =∫ ∫ f ( x 1 , ¿ ¿ x 2 , x 3 )d x 2 d x 3 ¿ ¿
1
−∞ −∞

Independence of Multiple Random Variables:

Let X 1 ,…, X n be random variables. If X 1 ,…, X n are


independent if and only if
f ( x 1 ,… , x n )=f 1 ( x 1 ) ⋯f n ( x n )
,
x ,…, x n )
for all pairs of ( 1 , where f j is the
marginal probability distribution (or density)
function of X j .
Result

25
If X ~ N(µ1, σ12) and Y ~ N(µ2, σ22) and X and Y
are independent, then
X +Y ~ N(µ1+ µ2, σ12 + σ22)
Results:
- E(X1 +…+ X n) = E(X1) +…+ E(X n)
If X1,…, Xn are independent, then
- var(X1 +…+ X n) = var(X1) +…+ var(X n)
n

- f ( x , … , x )=∏ f (x )
X1 ,... , X n 1 n
i=1
Xi i

In statistics we call X1,…, Xn a random sample if


1. X1,…, Xn are independent.
2. X1,…, Xn all have the same distribution
This population is called the population distri-
bution, so if E(Xi)= µ and var( Xi) = σ2),
Then
n

 The sum ∑ X : i=1


i

n n
E(∑ X i)
i=1
= nµ and var ( ∑ X i )
i=1
= nσ2 .
n

 The sample mean


1
X́ = ∑X
n i=1 i:
σ2 ¿ σ = σ
1. E ( X́ )=μ , 2. var ( X́ )=
n

√n
26
Proof of (1)

Proof of (2)

Random samples and the normal distribution


27
If X1,…, Xn are independent and Xi ~ N(µ, σ2) (i = 1,
…,n) then:
n

 The sum ∑ X ~ N(nµ, nσ2)


i=1
i

σ2
 The sample mean X́ ~ N(µ, )n

Example: Suppose a random sample of


size 25 observations is selected from a
population that is normally distributed with
mean 106 and standard deviation 12 find
a. The mean and standard deviation of the
sampling distribution of the sample
mean?
mean of { X̄ is106 ¿
σ 12
S .D of { X̄= = =2. 4 ¿
√ n √25
b. Find P(( X̄−μ )<4 )

28
( X̄ − μ ) 4
P( < )
σ /√ n 12/ √ 25

P(Z <1 .67 )=0. 9525


The Central Limit Theorem
If X1, X2 ,… are independent and E(Xi )= µ and
var(Xi )= σ2, i = 1, 2,… then for large n
σ2
X́ ~ N(µ, )
n

Or

2
S n ≈ N ( nμ , nσ )
Where Sn= X1+ X2+…+ Xn (Sample Sum)
  Rule of thumb:  n  >  30  is large
Example: Suppose that a random sample of
size n = 100 is drawn from a population with
mean 70 and standard deviation 20. What is

29
the probability that the sample mean X̄ will
be less than 73
X̄ −μ 73−70
P( X̄ <73 )≈P( < )
σ / √n 20 / √100 )
P(Z <1 .5 )=0 . 9332

Normal Approximation to
Binomial
Shape of the Binomial Distribution
The shape of the binomial distribution depends on
the values of n and p.

30
A Bernoulli trial is any trial with only two
possible outcomes (denoted by success and
failures).
A Bernoulli random variable takes only two
values 0 and 1.
Let p : success probability
1-p : failure probability

31
So the Bernoulli distribution is
0 1
1-p p

The mean and variance of the Bernoulli


random variable is
p) + 0(1-p) = p
2 =[02(1-p)+12(p)] – (p2) = p(1-p)
Relation between Bernoulli and Binomial
Distributions
Consider n independent Bernoulli trials
Let Xi: Bernoulli (p), i=1,…,n

X i=¿ {1 if success (with probability p)¿} ¿{}


X1+ X2+…,+Xn = sum of n independent

32
Bernoulli random variables
= sum of the number of
Successes in n trials
=X :B(n, p)
= no. of success in n
independent trials

Using CLT, for large n,


2
S n approx . N ( nμ , nσ )
Sn= X1+ X2+…+ Xn = X :B(n, p)
Then, for large n and/or p is close to ½,
2
X :B(n, p) Approx. N ( nμ , nσ )
Note: = p, 2 = p(1-p)
33
X :B(n, p) Approx.
N ( np , np( 1− p ))

(Binomial Approximated by Normal)


Example:
Let X:B(100, 1/2)

Find P( 20≤X ≤80 ) ?


80
= ∑ ( 100 ¿ ) ¿ ¿ ¿ ¿
i =20 ¿

Not easy !
Using CLT

To find P( a≤ X≤b ) using the above


Normal approximation

34
a−np X−np b−np
¿ P( ≤ ≤ )
√ np (1−p ) √ np(1− p ) √ np(1−p )
a−np b−np
¿ P( ≤Z≤ )
√ np (1−p ) √np (1− p )
Using the z-table, we get the answer
Approximating discrete distribution by
continuous distribution
Example: Suppose we toss a fair coin 20 times.
Let X be the random variable representing the
number of heads thrown.
X ~ Bin(20, ½)
In this diagram, the rectangles represent the binomial
distribution and the curve is the normal distribution:

35
We want P(9 ≤ X ≤ 11), which is the red shaded
area. Notice that the first rectangle starts at 8.5 and
the last rectangle ends at 11.5 . Using a continuity
correction, therefore, our probability becomes P(8.5
< X < 11.5) in the normal distribution.

P( a−0. 5≤X≤b+0 .5 )
a−0 . 5−np b+0 .5−np
¿ P( ≤Z ≤ )
√ np(1−p ) √ np(1− p )
Example: For example, Use normal approximation
to a binomial distribution with n  10 and p  0.5
Using normal approximation to find
P(3 ≤ X ≤ 7) ?
= P(2.5 ≤ X ≤ 7.5)

36
np = 5, np(1-p) = 2.5
2.5  np 7.5  np
P(2.5  X  7.5)  P( Z )  P(1.58  Z  1.58)
np(1  p ) np(1  p)
= 0.8858
Exact Answer
7
= ∑ ( 10 ¿ ) ¿ ¿ ¿ ¿
i =3 ¿

37

You might also like