Ken Black QA ch03
Ken Black QA ch03
th
ed.
by Ken Black
Chapter 3
Descriptive
Statistics
Discrete Distributions
PowerPoint presentations prepared by Lloyd Jaisingh,
Morehead State University
Learning Objectives
Distinguish between measures of central
tendency, measures of variability, measures
of shape, and measures of association.
Understand the meanings of mean, median,
mode, quartile, percentile, and range.
Compute mean, median, mode, percentile,
quartile, range, variance, standard deviation,
and mean absolute deviation on ungrouped
data.
Differentiate between sample and
population variance and standard deviation.
Learning Objectives -- Continued
Understand the meaning of standard
deviation as it is applied by using the
empirical rule and Chebyshevs theorem.
Compute the mean, mode, standard
deviation, and variance on grouped data.
Understand skewness, kurtosis, and box and
whisker plots.
Compute a coefficient of correlation and
interpret it.
Measures of Central Tendency:
Ungrouped Data
Measures of central tendency yield
information about the center, or middle part,
of a group of numbers.
Common Measures of central tendency
Mode
Median
Mean
Percentiles
Quartiles
Mode
The most frequently occurring value in a
data set
Applicable to all levels of data
measurement (nominal, ordinal, interval,
and ratio)
Bimodal -- Data sets that have two modes
Multimodal -- Data sets that contain more
than two modes
The mode is 44.
44 is the most frequently
occurring data value.
35
37
37
39
40
40
41
41
43
43
43
43
44
44
44
44
44
45
45
46
46
46
46
48
Mode -- Example
Median
Middle value in an ordered array of
numbers
Applicable for ordinal, interval, and ratio
data
Not applicable for nominal data
Unaffected by extremely large and
extremely small values
Median: Computational Procedure
First Procedure
Arrange the observations in an ordered array.
If there is an odd number of terms, the median
is the middle term of the ordered array.
If there is an even number of terms, the median
is the average of the middle two terms.
Second Procedure
The medians position in an ordered array is
given by (n+1)/2.
Median: Example
with an Odd Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
There are 17 terms in the ordered array.
Position of median = (n+1)/2 = (17+1)/2 = 9
The median is the 9th term, which is 15.
If the 22 is replaced by 100, the median is
15.
If the 3 is replaced by -103, the median is
15.
Median: Example
with an Even Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21
There are 16 terms in the ordered array.
Position of median = (n+1)/2 = (16+1)/2 = 8.5
The median is between the 8th and 9th terms,
14.5.
If the 21 is replaced by 100, the median is
14.5.
If the 3 is replaced by -88, the median is 14.5.
Arithmetic Mean
Commonly called the mean
Is the average of a group of numbers
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set,
including extreme values
Computed by summing all values in the
data set and dividing the sum by the number
of values in the data set
Population Mean
= =
+ + + +
=
+ + + +
=
=
X
N N
X X X X
N 1 2 3
24 13 19 26 11
5
93
5
18 6
...
.
Sample Mean
X
X
n n
X X X X
n
= =
+ + + +
=
+ + + + +
=
=
1 2 3
57 86 42 38 90 66
6
379
6
63167
...
.
Percentiles
Measures of central tendency that divide a
group of data into 100 parts
At least n% of the data lie below the nth
percentile, and at most (100 - n)% of the data
lie above the nth percentile
Example: 90th percentile indicates that at least
90% of the data lie below it, and at most 10%
of the data lie above it
The median and the 50th percentile have the
same value.
Applicable for ordinal, interval, and ratio data
Not applicable for nominal data
Percentiles: Computational Procedure
Organize the data into an ascending ordered
array.
Calculate the
percentile location:
Determine the percentiles location and its
value.
If i is a whole number, the percentile is the
average of the values at the i and (i + 1)
positions.
If i is not a whole number, the percentile is at
the whole number part of (i + 1) in the ordered
array.
i
P
n =
100
( )
Where
P = percentile
i= percentile
location
n= sample size
Percentiles: Example
Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
Location of
30th percentile:
The location index, i, is not a whole number; i + 1
= 2.4 + 1 = 3.4; the whole number portion is 3; the
30th percentile is at the 3rd location of the array;
the 30th percentile is 13.
i = =
30
100
8 2 4 ( ) .
Quartiles
Measures of central tendency that divide a group
of data into four subgroups
Q
1
:
25% of the data set is below the first quartile
Q
2
:
50% of the data set is below the second
quartile
Q
3
:
75% of the data set is below the third quartile
Q
1
is equal to the 25th percentile
Q
2
is located at
50th percentile and equals the
median
Q
3
is equal to the 75th percentile
Quartile values are not necessarily members of the
data set
Ordered array: 106, 109, 114, 116, 121, 122,
125, 129
Q
1
Q
2
:
Q
3
:
Quartiles: Example
i Q = = =
+
=
25
100
8 2
109 114
2
1115 1 ( ) .
i Q = = =
+
=
50
100
8 4
116 121
2
1185 2 ( ) .
i Q = = =
+
=
75
100
8 6
122 125
2
1235 3 ( ) .
Variability
Mean
Mean
Mean
No Variability in Cash Flow (same amounts)
Variability in Cash Flow (different amounts)
Mean
Variability
No Variability
Variability
Measures of Variability:
Ungrouped Data
Measures of variability describe the spread
or the dispersion of a set of data.
Common Measures of Variability
Range
Interquartile Range
Mean Absolute Deviation
Variance
Standard Deviation
Z scores
Coefficient of Variation
Range
The difference between the largest and the
smallest values in a set of data
Simple to compute
Ignores all data points except the
two extremes
Example:
Range =
Largest - Smallest =
48 - 35 = 13
35
37
37
39
40
40
41
41
43
43
43
43
44
44
44
44
44
45
45
46
46
46
46
48
Interquartile Range
Range of values between the first and third
quartiles
Range of the middle 50% of the ordered data
set
Less influenced by extremes
Interquartile Range Q Q = 3 1
Deviation from the Mean
Data set: 5, 9, 16, 17, 18
Mean: = 13
Deviations (x - ) from the mean: -8, -4, 3, 4, 5
0 5 10 15 20
-8
-4
+3
+4
+5
=
=
,
, .
Sample Standard Deviation
Square root of the
sample variance
( )
2
2
2
1
663 866
3
221 288 67
221 288 67
470 41
S
X X
S
n
S
=
=
=
=
=
=
,
, .
, .
.
Uses of Standard Deviation
Indicator of financial risk
Quality Control
construction of quality control charts
process capability studies
Comparing populations
household incomes in two cities
employee absenteeism at two plants
Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial
Security
o
A 15%
3%
B
15% 7%
Empirical Rule
Data are normally distributed (or approximately
normal)
o 1
o 2
o 3
95
99.7
68
Distance from
the Mean
Percentage of Values
Falling Within Distance
Chebyshevs Theorem
Applies to all distributions
P k X k
k
for
( ) o o < < + > 1
1
2
k >1
Chebyshevs Theorem
Applies to all distributions
o 4
o 2
o 3
1-1/3
2
= 0.89
1-1/2
2
= 0.75
Distance from
the Mean
Minimum Proportion
of Values Falling
Within Distance
Number
of
Standard
Deviations
K = 2
K = 3
K = 4
1-1/4
2
= 0.94
Coefficient of Variation
Ratio of the standard deviation to the mean,
expressed as a percentage
Measurement of relative dispersion
( )
C V
=
o
100
Coefficient of Variation
( )
( )
2
84
10
100
10
84
100
11 90
2
2
2
2
o
o
=
=
=
=
=
C V
.
( )
( )
1
29
4 6
100
4 6
29
100
15 86
1
1
1
1
o
o
=
=
=
=
=
.
.
.
C V
Measures of Central Tendency
and Variability: Grouped Data
Measures of Central Tendency
Mean
Median
Mode
Measures of Variability
Variance
Standard Deviation
Mean of Grouped Data
Weighted average of class midpoints
Class frequencies are the weights
=
=
=
+ + + +
+ + + +
fM
f
fM
N
f M f M f M f M
f f f f
i i
i
1 1 2 2 3 3
1 2 3
Calculation of Grouped Mean
Class Interval Frequency Class Midpoint fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150
= = =
fM
f
2150
50
43 0 .
Median of Grouped Data
( )
Median L
N
cf
f
W
Where
p
med
= +
=
2
:
L the lower limit of the median class
cf = cumulative frequency of class preceding the median class
f = frequency of the median class
W = width of the median class
N = total of frequencies
p
med
Median of Grouped Data -- Example
Cumulative
Class Interval Frequency Frequency
20-under 30 6 6
30-under 40 18 24
40-under 50 11 35
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
N = 50
( )
( )
Md L
N
cf
f
W
p
med
= +
= +
=
2
40
50
2
24
11
10
40 909 .
Mode of Grouped Data
Midpoint of the modal class
Modal class has the greatest frequency
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
Mode =
+
=
30 40
2
35
Variance and Standard Deviation
of Grouped Data
( )
2
2
2
o
o
o
=
=
f
N
M
Population
( )
2
2
2
1
S
M X
S
f
n
S
=
=
Sample
Population Variance and Standard
Deviation of Grouped Data
1944
1152
44
1584
1452
1024
7200
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Class Interval
6
18
11
11
3
1
50
f
25
35
45
55
65
75
M
150
630
495
605
195
75
2150
fM
-18
-8
2
12
22
32
M
( )
f
M
2
324
64
4
144
484
1024
( )
2
M
( )
2
2
7200
50
144
o
= = =
f
N
M
o
o
= = =
2
144 12
Measures of Shape
Skewness
Absence of symmetry
Extreme values in one side of a distribution
Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out
Box and Whisker Plots
Graphic display of a distribution
Reveals skewness
Symmetrical and Skewness
3 2 1 0 -1 -2 -3 -4
0.4
0.3
0.2
0.1
0.0 0
12 10 8 6 4 2 0
0.30
0.25
0.20
0.15
0.10
0.05
0.00 0
1.00 0.95 0.90 0.85 0.80 0.75 0.70
12
10
8
6
4
2
0 0
Symmetrical
Right or Positively
Skewed
Left or Negatively
Skewed
Relationship of Mean, Median and Mode
Relationship of Mean, Median and Mode
Relationship of Mean, Median and Mode
Coefficient of Skewness
Summary measure for skewness
If S
k
< 0, the distribution is negatively skewed
(skewed to the left).
If S
k
= 0, the distribution is symmetric (not
skewed).
If S
k
> 0, the distribution is positively skewed
(skewed to the right).
( )
o
d
k
M
S
=
3
Coefficient of Skewness
( )
( )
1
1
1
1
1
1
1
23
26
12 3
3
3 23 26
12 3
073
o
=
=
=
=
=
=
M
S
M
d
d
.
.
.
( )
( )
2
2
2
2
2
2
2
26
26
12 3
3
3 26 26
12 3
0
o
=
=
=
=
=
=
M
S
M
d
d
.
.
( )
( )
3
3
3
3
3
3
3
29
26
12 3
3
3 29 26
12 3
073
o
=
=
=
=
=
= +
M
S
M
d
d
.
.
.
Types of Kurtosis
Leptokurtic Distribution
Platykurtic Distribution
Mesokurtic Distribution
Box and Whisker Plot
Five specific values are used:
Median, Q
2
First quartile, Q
1
Third quartile, Q
3
Minimum value in the data set
Maximum value in the data set
Inner Fences
IQR = Q
3
- Q
1
Lower inner fence = Q
1
- 1.5 IQR
Upper inner fence = Q
3
+ 1.5 IQR
Outer Fences
Lower outer fence = Q
1
- 3.0 IQR
Upper outer fence = Q
3
+ 3.0 IQR
Box and Whisker Plot
Q
1
Q
3
Q
2
Minimum Maximum
Measures of Association
Measures of association are statistics that
yield information about the relatedness of
numerical variables.
Correlation is a measure of the degree of
relatedness of variables.
Pearson Product-Moment
Correlation Coefficient
( )( )
( )( )
( ) ( )
( )( )
( ) ( )
r
SSXY
SSX SSY
X X Y Y
XY
X Y
n
n n
X X Y Y
X
X
Y
Y
=
=
=
(
(
(
(
2 2
2
2
2
2
s s 1 1 r
Three Degrees of Correlation
r < 0 r > 0
r = 0
Computation of r for
the Economics Example (Part 1)
Day
Interest
X
Futures
Index
Y
1 7.43 221 55.205 48,841 1,642.03
2 7.48 222 55.950 49,284 1,660.56
3 8.00 226 64.000 51,076 1,808.00
4 7.75 225 60.063 50,625 1,743.75
5 7.60 224 57.760 50,176 1,702.40
6 7.63 223 58.217 49,729 1,701.49
7 7.68 223 58.982 49,729 1,712.64
8 7.67 226 58.829 51,076 1,733.42
9 7.59 226 57.608 51,076 1,715.34
10 8.07 235 65.125 55,225 1,896.45
11 8.03 233 64.481 54,289 1,870.99
12 8.00 241 64.000 58,081 1,928.00
Summations 92.93 2,725 720.220 619,207 21,115.07
X
2
Y
2
XY
Computation of r
for the Economics Example (Part 2)
( )( )
( ) ( )
( )
( )( )
( )
( )
( )
( )
r
X
X
Y
Y
XY
X Y
n
n n
=
(
(
(
(
=
(
(
(
(
=
2
2
2
2
2 2
2111507
92 93 2725
12
720 22
12
619 207
12
9293 2725
815
, .
.
. ,
.
.
Scatter Plot and Correlation Matrix
for the Economics Example
220
225
230
235
240
245
7.40 7.60 7.80 8.00 8.20
Interest
F
u
t
u
r
e
s
I
n
d
e
x
Interest Futures Index
Interest
1
Futures Index 0.815254 1
Copyright 2008 John Wiley & Sons, Inc.
All rights reserved. Reproduction or translation
of this work beyond that permitted in section 117
of the 1976 United States Copyright Act without
express permission of the copyright owner is
unlawful. Request for further information should
be addressed to the Permissions Department, John
Wiley & Sons, Inc. The purchaser may make
back-up copies for his/her own use only and not
for distribution or resale. The Publisher assumes
no responsibility for errors, omissions, or damages
caused by the use of these programs or from the
use of the information herein.