Joint Distributions: A Random Variable Is That Maps To Numbers
Joint Distributions: A Random Variable Is That Maps To Numbers
5 M235
Chapter 5
Joint Distributions
The Joint Distribution of Two Discrete
Random Variables:
Example: A experiment consists of three
tosses of a fair coin. The sample space
S = {HHH, HHT, HTH, HTT, THH, THT,
TTH, TTT}
Let the random variable X be the number of
heads.
A random variable is A function that
maps events to numbers.
X:S (the sample space)→R(the set of real numbers)
X(HHH)=3,X(HHT)=2,X(HTH)=2,
1
X(HTT)=1,X(THH)=2,X(THT)=1
X(TTT)=0,X(TTH)=1
Then the possible values of X are 0,1,2, 3
The probability distribution of X is given by
Table 1
x 0 1 2 3
P(X=x 1/8 3/8 3/ 1/8
) 8
Let the random variable Y be the number of
tails that precede the first head. (if no heads
come up, we let Y=3),
A random variable is A function that
maps events to numbers.
Y(HHH)=0,Y(HHT)=0,Y(HTH)=0,
Y(HTT)=0,Y(THH)=1,Y(THT)=1
Y(TTT)=3,Y(TTH)=2
2
then the possible values of Y are 0,1,2, and 3
The probability distribution of Y is given by
Table 2
y 0 1 2 3
P(Y=y 4/8 2/8 1/ 1/8
) 8
y
p(x, y) 0 1 2 3
0 0 0 0 1/8
x 1 1/8 1/8 1/8 0
2 2/8 1/8 0 0
3 1/8 0 0 0
The
notation “p(x, y)” refers to P(X = x and Y =
4
y); for example, p(2, 0) = P(X=2 and
Y=0)= 2/8. Notice that the 16 probabilities
in the table add to 1.
The entries in table 3 amount to the
specification of a function of x and y, p(x,
y)= P(X = x and Y = y). It is called the joint
probability mass function of X and Y.
Table 4
y
p(x, y) 0 1 2 3
0 0 0 0
1/ 1/8
P(X=x)
8
x
1 1/8 1/8 1/8 0 3/8
2 2/8 1/8 0 0 3/8
3 1/8 0 0 0 1/8
4/8 2/8 1/8 1/
8
P(Y=y)
5
Notice from Table 4, we can get tables 1 and 2 – The
individual mass functions of X and Y – by adding
across the rows and down the columns. In this
context the probability mass function of X and Y are
called marginal probability mass functions and
denoted by pX(x) and pY(y).
Definition. The joint probability mass
function of two discrete random variables X
and Y is the function p(x, y), defined for all
pairs of real numbers x and y by
P(x, y)=P(X = x and Y = y).
and satisfies the following two conditions:
6
Theorem: The marginal probability mass
functions can be obtained from the joint
probability mass function:
P( X= x)=∑ P( X =x ,Y = y)
y
7
For discrete variables this is equivalent
to:
X and Y are independent
P(X = x, Y = y) = P(X = x) P(Y = y)
for all x and y
X and Y are dependent if they are not
independent
Theorem:
E {g ( X , Y ) }=∑ ∑ g( x , y )P ( X=x , Y = y )
x y
e.g.
E( XY )=∑ ∑ xyP( X =x∧Y = y)
x y
8
Definition: The conditional probability mass
function of X, given Y=y denoted by PX(x|y) is
defined by
p( X=x ,Y =y)
p X ( x| y)=P( X=x|Y =y)=
P(Y= y )
9
a. Find P(X+Y = 2) ?
=P(0,2)+P(1,1)+P(2,0)=0+2/8+0 = 2/8
b. Find P(X > Y)?
=P(1,0)+P(2,0)+P(2,1) = 0
c. Find marginal probability mass
function of X
3
P( X= x )= ∑ P( X=x ,Y = y )
y =0
3
P( X=0)= ∑ P( X =0 , Y = y )
y =0
1 1 2
= p(0,0 )+ p(0,1)+ p(0,2 )+ p (0,3)= + +0+0=
8 8 8
3
P( X=1)= ∑ P( X=1 ,Y = y )
y=0
10
2 2 4
= p(1,0 )+ p (1,1)+ p(1,2)+ p(1,3 )=0+ + +0=
8 8 8
3
P( X=2)= ∑ P( X=2, Y = y )
y=0
1 1 2
= p(2,0 )+p (2,1)+ p(2,2)+ p(2,3 )=0+0+ + =
8 8 8
d. Find the conditional p.m.f of X,
given Y=1?
p( X =x ,Y =1)
p X|Y ( X=x|Y =1)=
P(Y =1 )
But PY(1)=3/8. Therefore
p( X =x ,Y =1)
p X|Y ( X=x|Y =1)= , x=0,1,2
P(Y =1 )
Thus
p( X =0 ,Y =1) 1/ 8
p X|Y ( X=0|Y =1)= = =1/ 3
P(Y =1) 3/ 8
p( X=1 , Y =1) 2/ 8
p X|Y ( X=1|Y =1 )= = =2/3
P (Y =1) 3/ 8
11
p ( X=2 , Y =1 ) 0
p X|Y ( X=2|Y =1 )= = =0
P( Y =1) 3 /8
These probabilities can be represented in
the following table:
x 0 1 2
P(X=x| 1/ 2/ 0
Y=1) 3 3
1 2
= p(0|1 )+ p(1|1)= + =1
3 3
f. Are X and Y independent?
X and Y are independent if
P(X=x, Y=y)= pX(X=x) . PY(Y=y)
for all x and y
Notice:
12
Check first if
p(0, 0) ≠ pX(0) . pY(0) ?
1/8 ≠ (2/8)(1/8)
Hence So, X and Y are not independent.
Joint Probability mass functions:
Let X 1 , …, X n be discrete random variables. The
joint probability distribution function is
P ( x1 ,…, x n ) =P ( X 1 =x 1 ,…, X n =x n )
,
where
x 1 ,…, x n are possible values of
X 1 , …, X n , respectively. P ( x1 ,…, x n)
satisfies the following conditions:
1.
P ( x1 ,…, x n ) ≥0
for all
x 1 ,…, x n ;
∑ ⋯∑ P ( x 1 ,…, x n) =1
x xn
2. 2 .
13
Independence of Multiple Random Variables:
15
n n
var
( )
∑ X i =∑ var ( X i )+∑ cov ( X i , X j )
i=1 i=1 i≠ j
16
- dépends on the unit of measurement
The correlation coefficient ρ(X,Y) does
not depend on the unit of
measurement:
cov ( X ,Y )
ρ( X ,Y )=
σ X σY
Cov( X , Y ) Cov( X ,Y )
Corr( X ,Y )= =
√ Var( X ) √ Var(Y ) σ X σ Y
Properties ρ(X,Y):
1. -1 ≤ ρ(X,Y) ≤ 1
So “ρ(X,Y) is a standardized measure for
linear dependence of X and Y”
Example:
17
Suppose first that X and Y have the joint
p.m.f p(x,y) as given in the following
table
P(x,y y
) 1 2 3
0 0.02 0.05 0.1 0.22
x 5
1 0.08 0.24 0.0 0.37
5
2 0.23 0.15 0.0 0.41
3
0.33 0.44 0.2
3
E(X) =0(.22)+1(.37)+2(.41)=1.19
18
E(Y) =1(.33)+2(.44)+3(.23)=1.90
E(XY) = 0(1)(.02)+0(2)(.05)+0(3)
(.15)+1(1)(.08)+1(2)(.24)+1(3)
(.37)+2(1)(.23)+2(2)(.15)+2(3)(.03)=1.95
Consequently,
Cov(X,Y) =1.95-1.19(1.90)= -0.311
Joint distribution of two continuous
random variables
If there is a function fX,Y(x,y) so that for
every B ⊂ R2
P(( X ,Y )∈B)= ∬ f X ,Y ( x, y)dxdy
(x, y )∈B
19
As X and Y are continuous, Probability of values of
(X, Y) in a rectangle:
d b
P ( a≤ X≤b , c≤Y ≤d )=∫∫ f ( x , y )dxdy
c a
2. ∫ ∫ f ( x , y ) dxdy =1
−∞ −∞
20
Marginal probability density functions:
Let X and Y be jointly continuous random
variables. The marginal probability density
function for X and Y are
∞
f X ( x)= ∫ f X ,Y ( x , y )dy
−∞ ,
and
∞
f Y ( y)= ∫ f X,Y ( x , y)dx
−∞
Conditional probability density function:
Let X and Y be continuous random variables. The
conditional probability density function of X given
21
Y is
f X|Y ( x , y )
f X |Y ( x| y )=
f Y ( y)
Where fY(y)>0
Independence of Two Random Variables:
Let X and Y be random variables. X and Y are
independent if and only if
f X , Y ( x , y )=f X ( x )f Y ( y ), for all x and y
Independence of continuous random variables
X and Y are independent if
f X , Y ( x , y )=f X ( x ) f Y ( y ) for all xR and yR
f(x,y)=¿ {4 xy 0<x<1,0<y<1¿} ¿ {}
(a) Find P( X <1/ 4 , Y < 1/ 2)?
22
1 1
2 4
P ( X <1/ 4 , Y < 1/ 2)=∫ ∫ 4 xydxdy
0 0
1 1
( )
4
∫ xdx
0
dy
2
x2 1 /4
P( X <1/ 4 , Y < 1/ 2)=∫ 4 y
0
2
|0 dy ( )
1
2
1
P ( X <1/ 4 , Y < 1/ 2)=∫ 4 y
0
( )
32
dy
1
2
1 1 y 2 1 /2 1
P( X <1/ 4 , Y < 1/ 2)= ∫ y dy = |0 =
8 0 8 2 64
(b) Find the marginal probability density function
for X and Y.
1 1
y2 1
f X ( x )=∫ 4 xydy=4 x ∫ ydy=4 x |0 =2 x 0<x <1
0 0 2
1 1
x2 1
f Y ( x )=∫ 4 xydx=4 y ∫ xdx =4 y |0 =2 y 0< y <1
0 0 2
(c) Find the conditional probability density function
for X given X and the conditional probability
1 2
23
density function for X given Y , f X |Y ( x|y ) ?
f X , Y ( x , y ) 4 xy
f X |Y ( x|y )= = =2 x , 0<x <1
f Y ( y) 2y
(d)
© Are X and Y independent?
check if f X ,Y ( x , y )=f X ( x )f Y ( y ) ?
4 xy =(2 x )(2 y )=4 xy
In particular:
∞ ∞
E ( XY )= ∫ ∫ xyf ( x , y ) dxdy
−∞ −∞
,
2.
∫−∞ ⋯−∞∫ f ( x1 ,…,xn ) dx1⋯d xn=1 .
properties for two variables can be easily extended
to 3 or more variables. Some examples:
FX 1 ,... , Xn ( x 1 ,… , x n )=P( X 1 ≤ x 1 , … , X n ≤ x n )
∞ ∞
f X ( x 1) =∫ ∫ f ( x 1 , ¿ ¿ x 2 , x 3 )d x 2 d x 3 ¿ ¿
1
−∞ −∞
25
If X ~ N(µ1, σ12) and Y ~ N(µ2, σ22) and X and Y
are independent, then
X +Y ~ N(µ1+ µ2, σ12 + σ22)
Results:
- E(X1 +…+ X n) = E(X1) +…+ E(X n)
If X1,…, Xn are independent, then
- var(X1 +…+ X n) = var(X1) +…+ var(X n)
n
- f ( x , … , x )=∏ f (x )
X1 ,... , X n 1 n
i=1
Xi i
n n
E(∑ X i)
i=1
= nµ and var ( ∑ X i )
i=1
= nσ2 .
n
Proof of (2)
σ2
The sample mean X́ ~ N(µ, )n
28
( X̄ − μ ) 4
P( < )
σ /√ n 12/ √ 25
Or
2
S n ≈ N ( nμ , nσ )
Where Sn= X1+ X2+…+ Xn (Sample Sum)
Rule of thumb: n > 30 is large
Example: Suppose that a random sample of
size n = 100 is drawn from a population with
mean 70 and standard deviation 20. What is
29
the probability that the sample mean X̄ will
be less than 73
X̄ −μ 73−70
P( X̄ <73 )≈P( < )
σ / √n 20 / √100 )
P(Z <1 .5 )=0 . 9332
Normal Approximation to
Binomial
Shape of the Binomial Distribution
The shape of the binomial distribution depends on
the values of n and p.
30
A Bernoulli trial is any trial with only two
possible outcomes (denoted by success and
failures).
A Bernoulli random variable takes only two
values 0 and 1.
Let p : success probability
1-p : failure probability
31
So the Bernoulli distribution is
0 1
1-p p
32
Bernoulli random variables
= sum of the number of
Successes in n trials
=X :B(n, p)
= no. of success in n
independent trials
Not easy !
Using CLT
34
a−np X−np b−np
¿ P( ≤ ≤ )
√ np (1−p ) √ np(1− p ) √ np(1−p )
a−np b−np
¿ P( ≤Z≤ )
√ np (1−p ) √np (1− p )
Using the z-table, we get the answer
Approximating discrete distribution by
continuous distribution
Example: Suppose we toss a fair coin 20 times.
Let X be the random variable representing the
number of heads thrown.
X ~ Bin(20, ½)
In this diagram, the rectangles represent the binomial
distribution and the curve is the normal distribution:
35
We want P(9 ≤ X ≤ 11), which is the red shaded
area. Notice that the first rectangle starts at 8.5 and
the last rectangle ends at 11.5 . Using a continuity
correction, therefore, our probability becomes P(8.5
< X < 11.5) in the normal distribution.
P( a−0. 5≤X≤b+0 .5 )
a−0 . 5−np b+0 .5−np
¿ P( ≤Z ≤ )
√ np(1−p ) √ np(1− p )
Example: For example, Use normal approximation
to a binomial distribution with n 10 and p 0.5
Using normal approximation to find
P(3 ≤ X ≤ 7) ?
= P(2.5 ≤ X ≤ 7.5)
36
np = 5, np(1-p) = 2.5
2.5 np 7.5 np
P(2.5 X 7.5) P( Z ) P(1.58 Z 1.58)
np(1 p ) np(1 p)
= 0.8858
Exact Answer
7
= ∑ ( 10 ¿ ) ¿ ¿ ¿ ¿
i =3 ¿
37