0% found this document useful (0 votes)
28 views26 pages

Lecture 6 Adv. Sampling Distribution

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views26 pages

Lecture 6 Adv. Sampling Distribution

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Lecture 6

SAMPLING AND SAMPLING DISTRIBUTIONS


Lecture Outline
Population and samples
Sampling Distribution of the mean
z-distribution
t-distribution
Sampling Distribution of the variance
Chi-squared distribution
F-Distribution

Spring 2021 DR. MAHA A. HASSANEIN 2


Population and Samples
- Population: The total set of observations we want to make inferences
about. Size N.
- Random Sample: A subset of observations drawn from the population.
Size n.
- Parameter: A characteristic of the population (e.g. mean, variance,
proportion)
- Statistic: A measurement calculated from the sample (e.g. sample
mean, sample variance)

Spring 2021 DR. MAHA A. HASSANEIN 3


Motivation for Sampling Distributions
Using samples to make inferences about populations
Statistics such as the sample mean are random variables since
they depend on the particular random sample selected.
What does the distribution of sample statistics look like ?
Does it look like the distribution of the individual data points from
the underlying distribution?
What distributions of statistics are the same and what are
different ?
How are the distribution of sample statistics affected by the
underlying distribution of individuals?

Spring 2021 DR. MAHA A. HASSANEIN 4


Sampling
Draw a random sample of size n from a population. Assume the
observed data are independent identically distributed R.V.s (i.i.d)
𝑋1 , 𝑋2 , … , 𝑋𝑛
A statistic 𝜃 for the sample is a function in the R.V.s 𝑋𝑖
Examples of statistics: Sample mean, or sample standard deviation
The computed statistic is a random variable
What is the significance of i.i.d. samples in statistical analysis?
If we repeat the sampling, do we expect the same statistic value every
sample?

Spring 2021 DR. MAHA A. HASSANEIN 5


Illustrative Example
Roll a single Dice 10 times and record the reading (Observations, n=10)
Repeat this sample for 20 times (Random Samples)
Record the output for each sample as a set .
For each sample , construct the frequency table of the individual dice
outputs and the mean .
Find the overall frequency table of the random variable of individual
dice record and the mean .

Spring 2021 DR. MAHA A. HASSANEIN 6


Results
0.3 Sampling distribution of the sample Mean
Sampling distribution of the 0.35
0.25 Individual Dice Output
0.3

0.2
0.25

0.15 0.2

0.15
0.1
0.1

0.05
0.05

0 0
1 2 3 4 5 6 1 2 3 4 5 6

Spring 2021 DR. MAHA A. HASSANEIN 7



Sampling Distribution of X
ത always tend to be
The fact that sampling distributions for sample means X
approximately normal in shape is described by the Central Limit Theorem.
Definition. The Central Limit Theorem
If a random sample of size n is drawn from a population with mean 𝜇 and
variance 𝜎 2 , then the sample mean 𝑋ത has approximately a normal
distribution with mean 𝜇 and variance 𝜎 2 /n.
That is ,
𝜎2

𝑋~𝑁 𝜇,
n
- Derive sampling distribution of the mean

MAHA HASSANEIN 2018


Illustrative Example Using R

We conclude : Even though the original data set is not normal


shaped and in fact it is skewed to the right the sample means
are normally distributed
 this explains the CLT idea

18/01/2024 DR. MAHA A. HASSANEIN 9


The sampling distributions
of sample means :

Random samples of sizes


n = 2, 5, 12, 30 were taken
from uniform, Exp, Normal
populations

MAHA HASSANEIN 2018


Important Remark
In case 𝝈 known, Large Sample (𝑛 ≥ 25) drawn from population of any
distribution by (CLT)
𝜎2

𝑋~𝑁 𝜇,
n
Transform the 𝑋ത to the standard normal R.V. 𝑍~𝑁 0,1
൫𝑋ത − 𝜇)
𝑍=
𝜎Τ 𝑛
Hence, computing probability as follows

In case 𝝈 unknown, n large (𝑛 ≥ 25), we can substitute for 𝝈 with the sample
standard deviation S and work with the z-distribution as above but 𝜎 ≈ 𝑆

MAHA HASSANEIN 2018


z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
Standard Normal 0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
Probabilities 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
The values in the table are 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
the areas between zero and 1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
the z-score. That is, 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
P(0 < Z < z-score) 1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

MAHA HASSANEIN 2018


Example 1
The fracture strengths of a certain type of glass average 14 (in
thousands of pounds per square inch) and have a standard
deviation of 2.
a. What is the probability that the average fracture
strength for 100 randomly selected pieces of this glass
exceeds 14.5?
b. Find an interval that includes the average fracture
strength for 100 randomly selected pieces of this glass
with probability 0.95.

MAHA HASSANEIN 2018


Solution :
According to the Central Limit Theorem, the average strength ഥ
𝑿 has
approximately a normal distribution with mean 14 and standard
deviation 𝜎Τ 𝑛 = 2Τ 100 = 0.2.

a.

The probability of getting an average value (for sample of size


100) more than 0.5 units above the population mean is very
small (see Figure).
MAHA HASSANEIN 2018
b. We are interested in determining interval
(a, b) such that P(a ≤ 𝑋ത ≤ b) = 0.95.
Earlier We saw that such a and b are obtained
as
a = 14 - 1.96(0.2) = 13.6 and

b = 14 +1.96(0.2) = 14.4
As illustrated by Figure, approximately 95% of
the sample mean fracture strengths, for
samples of size 100, should lie between 13.6
and 14.4.

MAHA HASSANEIN 2018


Example 2 :
A certain machine that is used to fill bottles with soda has been
observed over a long period of time, and the variance in the amounts
filled is found to be approximately 𝝈𝟐 = 1 sq. ounce. However, the mean
amount filled 𝝁 depends on an adjustment that may change from day to
day, or from operator to operator.
a. If 25 observations on the amount dispensed (in ounces) are taken on a
given day (all with the same machine setting), find the probability that the
sample mean will be within 0.3 ounces of the true population mean for
that setting.
b. How many observations should be taken in the sample so that the sample
mean will be within 0.3 ounces of the population mean with probability
0.95?

MAHA HASSANEIN 2018


Solution:

a. We assume that n = 25 is large enough for the sample mean 𝑋ത to have


approximately a normal distribution with mean 𝜇 and standard deviation
𝜎Τ 25 = 1/ √25 = 0.2. Then,

MAHA HASSANEIN 2018


b. To find n such that

P(|𝑋ത - 𝜇ȁ ≤ 0.3) = P[-0.3 ≤ (𝑋-𝜇)


ത ≤ 0.3] = 0.95

As 𝝈= 1, the standard error is 𝜎/√𝑛= 1/√𝑛 we get

(Xഥ − μ)
P[ -0.3 n ≤ ≤ 0.3 n ]= 0.95
σ/ n

From Table, P[-l.96 ≤ Z ≤ 1.96] = 0.95.


1.96 2
Comparing we get 0.3 √𝑛= 1.96  n = 0.3
= 42.68

Thus, 43 observations will be needed for the sample mean to have 95%
chance of being within 0.3 ounce of the population mean.

MAHA HASSANEIN 2018


T- distribution
Theorem : If 𝑋ത is the mean of the a random sample of size n
taken from a normal population having mean 𝜇 and the
2 2 𝑛 𝑋𝑖 −𝑋ത
variance 𝜎 , and 𝑆 = σ𝑖=1 , then
𝑛−1

𝑋ത − 𝜇
𝑡=
𝑆/ 𝑛

with mean =0 and 𝜎 approaches 1 as n → ∞.


𝜈 = 𝑛 − 1 the number of degrees of freedom.

Spring 2021 DR. MAHA A. HASSANEIN 19


T-distribution and Standard
Normal distribution

Standard Normal

t-distribution

Spring 2021 DR. MAHA A. HASSANEIN 20


Table for one Tailed t-Distribution
t-distribution

1−𝛼

𝛼 𝑡𝛼

𝑡𝛼

𝑃 𝑇 > 𝑡𝛼 = 𝛼

Spring 2021 DR. MAHA A. HASSANEIN 21


Examples:
From Tables :𝑃 𝑇 > 𝑡𝛼 = 𝛼 and 𝑃 𝑇 < 𝑡𝛼 =1 − 𝛼
𝑃 𝑇 < −𝑡𝛼 = 𝛼 and 𝑃 𝑇 > −𝑡𝛼 =1 − 𝛼
Ex.1: If t-values with 10 degree of freedom leaving an area
𝛼= 0.025 to the right . Find 𝑡𝛼 ?
Ans. 𝑡𝛼 = 𝑡0.025 = 2.228
Ex.2: If t-values with 10 degree of freedom leaving an area
of 0.95 to the right . Find 𝑡𝛼 ?
Ans. 𝑡0.95 = −𝑡0.05 = −1.812

Spring 2021 DR. MAHA A. HASSANEIN 22


Examples
Ex.3: If t-values with 14 degree of freedom leaving an area of
0.025 to the left . Find 𝑡𝛼 ?
Ans. 𝑡0.975 = −𝑡0.025 = −2.145

Ex.4: Find 𝑃 −𝑡0.025 < 𝑇 < 𝑡0.05 .


Ans. Since 𝑡0.05 leaves an area of 0.05 to the right , and -𝑡0.025
leaves an area of 0.025 to the left , we find a total area of 1
− 0.05 − 0.025 = 0.925
Hence 𝑃 −𝑡0.025 < 𝑇 < 𝑡0.05 = 0.925

Spring 2021 DR. MAHA A. HASSANEIN 23


Example 3
Find k such that 𝑷 𝒌 < 𝑻 < −𝟏. 𝟕𝟔𝟏 = 𝟎. 𝟎𝟒𝟓 for a random sample
ഥ −𝝁
𝑿
of size 15 selected from a normal distribution and 𝒕 = .
𝑺/ 𝒏

Answer. From Table : 𝑡𝛼 = −1.761 corresponds to 𝑡0.05 when 𝜈 = 14


Therefore, −𝑡0.05 = −1.761.
0.045
Let 𝑘 = −𝑡𝛼 we have
0.045 = 0.05 − 𝛼 → 𝛼 = 0.005
From table , with 𝜈 = 14
𝑘 = −𝑡0.005 = −2.977 and
𝑃 −2.977 < 𝑇 < −1.761 = 0.045
k −1.761

Spring 2021 DR. MAHA A. HASSANEIN 24


𝜶 𝜶

𝝂
𝒕𝜶
𝒕𝜶

Spring 2021 DR. MAHA A. HASSANEIN 25


Text book
Chapter 6. sec 6.1,6.2,6.3

Reference

Spring 2021 DR. MAHA A. HASSANEIN 26

You might also like