0% found this document useful (0 votes)

4 views

Chapter 3(Technical English for Statistics)

Chapter 3 provides definitions and explanations of key statistical concepts such as mean, median, mode, variance, and standard deviation, as well as various measures of central tendency and variation. It discusses the importance of understanding different types of distributions, including skewed and symmetric distributions, and introduces methods for calculating percentiles, quartiles, and identifying outliers. The chapter emphasizes the significance of these statistical measures in analyzing and interpreting data effectively.

Uploaded by

kirilyakov96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Chapter 3(Technical English for Statistics)

Uploaded by

kirilyakov96

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Chapter 3

Data Description
Definitions

Statistic
Characteristic or measure obtained from a sample
Parameter
Characteristic or measure obtained from a population
Mean
Sum of all the values divided by the number of values. This can either be a population
mean (denoted by mu) or a sample mean (denoted by x bar)
Median
The midpoint of the data after being ranked (sorted in ascending order). There are as
many numbers below the median as above the median.
Mode
The most frequent number
Skewed Distribution
The majority of the values lie together on one side with a very few values (the tail) to
the other side. In a positively skewed distribution, the tail is to the right and the mean
is larger than the median. In a negatively skewed distribution, the tail is to the left and
the mean is smaller than the median.
Symmetric Distribution
The data values are evenly distributed on both sides of the mean. In a symmetric
distribution, the mean is the median.
Weighted Mean
The mean when each value is multiplied by its weight and summed. This sum is
divided by the total of the weights.
Midrange
The mean of the highest and lowest values. (Max + Min) / 2
Range
The difference between the highest and lowest values. Max - Min
Population Variance
The average of the squares of the distances from the population mean. It is the sum of
the squares of the deviations from the mean divided by the population size.
Sample Variance
Unbiased estimator of a population variance. Instead of dividing by the population
size, the sum of the squares of the deviations from the sample mean is divided by one
less than the sample size.
Standard Deviation
The square root of the variance. The population standard deviation is the square root
of the population variance and the sample standard deviation is the square root of the
sample variance. The sample standard deviation is not the unbiased estimator for the
population standard deviation.
Coefficient of Variation
Standard deviation divided by the mean, expressed as a percentage.
Chebyshev's Theorem
The proportion of the values that fall within k standard deviations of the mean is at

least where k > 1. Chebyshev's theorem can be applied to any distribution

regardless of its shape.
Empirical or Normal Rule
Only valid when a distribution in bell-shaped (normal). Approximately 68% of the
data lies within 1 standard deviation of the mean; 95% of the data lies within 2
standard deviations; and 99.7% of the data lies within 3 standard deviations of the
mean.
Standard Score or Z-Score
The value obtained by subtracting the mean and dividing by the standard deviation.
When all values are transformed to their standard scores, the new mean (for Z) will be
zero and the standard deviation will be one.
Percentile
The percent of the population which lies below that value. The data must be ranked to
find percentiles.
Quartile
Either the 25th, 50th, or 75th percentiles. The 50th percentile is also called the median.
Decile
Either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th percentiles.
Box and Whiskers Plot (Box Plot)
A graphical representation of the minimum value, lower fourth (hinge), median, upper
fourth, and maximum. Some textbooks define the five values as the minimum, first
Quartile, median, third Quartile, and maximum.
Five Number Summary
Minimum value, lower fourth, median, upper fourth, and maximum.
InterQuartile Range (IQR)
The difference between the 3rd and 1st Quartiles.
Outlier
An extremely high or low value when compared to the rest of the values.
Mild Outliers
Values which lie between 1.5 and 3.0 times the InterQuartile Range below the 1st
Quartile or above the 3rd Quartile.
Extreme Outliers
Values which lie more than 3.0 times the InterQuartile Range below the 1st Quartile or
above the 3rd Quartile.

Measures of Central Tendency

The term "Average" is vague

Average could mean one of four things. The arithmetic mean, the median, midrange, or mode.
For this reason, it is better to specify which average you're talking about.

Mean

This is what people usually intend when they say "average"

Population Mean:

Sample Mean:

Sample Mean for Frequency Distribution:

The mean of a frequency distribution is also the weighted mean.

Median

The data must be ranked (sorted in ascending order) first. The median is the number in the
middle.

To find the depth of the median, there are several formulas that could be used, the one that we
will use is: Depth of median = 0.5 * (n + 1)

Raw Data

The median is the number in the "depth of the median" position. If the sample size is even, the
depth of the median will be a decimal -- you need to find the midpoint between the numbers
on either side of the depth of the median.

Ungrouped Frequency Distribution

Find the cumulative frequencies for the data. The first value with a cumulative frequency
greater than depth of the median is the median. If the depth of the median is exactly 0.5 more
than the cumulative frequency of the previous class, then the median is the midpoint between
the two classes.

Grouped Frequency Distribution

Since the data is grouped, you have lost all original information. Some textbooks have you
simply take the midpoint of the class. This is an over-simplification which isn't the true value
(but much easier to do). The correct process is to interpolate.

Find out what proportion of the distance into the median class the median by dividing the
sample size by 2, subtracting the cumulative frequency of the previous class, and then
dividing all that bay the frequency of the median class.

Multiply this proportion by the class width and add it to the lower boundary of the median
class.
Mode

The mode is the most frequent data value. There may be no mode if no one value appears
more than any other. There may also be two modes (bimodal), three modes (trimodal), or
more than three modes (multi-modal).

For grouped frequency distributions, the modal class is the class with the largest frequency.

Midrange

The midrange is simply the midpoint between the highest and lowest values.

Summary

The Mean is used in computing other statistics (such as the variance) and does not exist for
open ended grouped frequency distributions. It is often not appropriate for skewed
distributions such as salary information.

The Median is the center number and is good for skewed distributions because it is resistant to
change.

The Mode is used to describe the most typical case. The mode can be used with nominal data
whereas the others can't. The mode may or may not exist and there may be more than one
value for the mode .

The Midrange is not used very often. It is a very rough estimate of the average and is greatly
affected by extreme values.

Property Mean Median Mode Midrange

Always Exists No Yes No Yes

Uses all data values Yes No No No

Affected by extreme Yes No No Yes

values

Measures of Variation
Range

The range is the simplest measure of variation to find. It is simply the highest value minus the
lowest value.
RANGE = MAXIMUM - MINIMUM

Since the range only uses the largest and smallest values, it is greatly affected by extreme
values, that is - it is not resistant to change.

Variance

"Average Deviation"

The range only involves the smallest and largest numbers, and it would be desirable to have a
statistic which involved all of the data values.

Average deviation defines as below:

The problem is that this summation is always zero. So, the average deviation will always be
zero. That is why the average deviation is never used.

Population Variance

So, to keep it from being zero, the deviation from the mean is squared and called the "squared
deviation from the mean". This "average squared deviation from the mean" is called the
variance.

Unbiased Estimate of the Population Variance

One would expect the sample variance to simply be the population variance with the
population mean replaced by the sample mean. However, one of the major uses of statistics is
to estimate the corresponding parameter. This formula has the problem that the estimated
value isn't the same as the parameter. To counteract this, the sum of the squares of the
deviations is divided by one less than the sample size.

Standard Deviation

There is a problem with variances. Recall that the deviations were squared. That means that
the units were also squared. To get the units back the same as the original data values, the
square root must be taken.
The sample standard deviation is not the unbiased estimator for the population standard
deviation.

Sum of Squares

The sum of the squares of the deviations from the means is given a shortcut notation and
several alternative formulas.

A little algebraic simplification returns:

Chebyshev's Theorem

The proportion of the values that fall within k standard deviations of the mean will be at least

, where k is an number greater than 1.

"Within k standard deviations" interprets as the interval: to .

Chebyshev's Theorem is true for any sample set, not matter what the distribution.

Empirical Rule

The empirical rule is only valid for bell-shaped (normal) distributions. The following
statements are true.

 Approximately 68% of the data values fall within one standard deviation of the mean.
 Approximately 95% of the data values fall within two standard deviations of the mean.
 Approximately 99.7% of the data values fall within three standard deviations of the
mean.

The empirical rule will be revisited later in the chapter on normal probabilities.

Measures of Position

Standard Scores (z-scores)

The standard score is obtained by subtracting the mean and dividing the difference by the
standard deviation. The symbol is z, which is why it's also called a z-score.
The mean of the standard scores is zero and the standard deviation is 1. This is the nice
feature of the standard score -- no matter what the original scale was, when the data is
converted to its standard score, the mean is zero and the standard deviation is 1.

Percentiles, Deciles, Quartiles

Percentiles (100 regions)

The kth percentile is the number which has k% of the values below it. The data must be
ranked.

1. Rank the data

2. Find k% (k /100) of the sample size, n.
3. If this is an integer, add 0.5. If it isn't an integer round up.
4. Find the number in this position. If your depth ends in 0.5, then take the midpoint
between the two numbers.

It is sometimes easier to count from the high end rather than counting from the low end. For
example, the 80th percentile is the number which has 80% below it and 20% above it. Rather
than counting 80% from the bottom, count 20% from the top.

Note: The 50th percentile is the median.

If you wish to find the percentile for a number (rather than locating the kth percentile), then

1. Take the number of values below the number

2. Add 0.5
3. Divide by the total number of values
4. Convert it to a percent

Deciles (10 regions)

The percentiles divide the data into 100 equal regions. The deciles divide the data into 10
equal regions. The instructions are the same for finding a percentile, except instead of
dividing by 100 in step 2, divide by 10.

Quartiles (4 regions)

The quartiles divide the data into 4 equal regions. Instead of dividing by 100 in step 2, divide
by 4.

Note: The 2nd quartile is the same as the median. The 1st quartile is the 25th percentile, the 3rd
quartile is the 75th percentile.

The quartiles are commonly used (much more so than the percentiles or deciles).
Five Number Summary

The five number summary consists of the minimum value, lower fourth, median, upper fourth,
and maximum value.

Box and Whiskers Plot

A graphical representation of the five number summary. A box is drawn between the lower
and upper fourths with a line at the median. Whiskers (a single line, not a box) extend from
the fourths to lines at the minimum and maximum values.

Interquartile Range (IQR)

The interquartile range is the difference between the third and first quartiles. That's it: Q3 -
Q1

Outliers

Outliers are extreme values. There are mild outliers and extreme outliers.

Extreme Outliers

Extreme outliers are any data values which lie more than 3.0 times the interquartile range
below the first quartile or above the third quartile. x is an extreme outlier if ...

x < Q1 - 3 * IQR

x > Q3 + 3 * IQR

Mild Outliers

Mild outliers are any data values which lie between 1.5 times and 3.0 times the interquartile
range below the first quartile or above the third quartile. x is a mild outlier if ...

Q1 - 3 * IQR <= x < Q1 - 1.5 * IQR

Q1 + 1.5 * IQR < x <= Q3 + 3 * IQR

28 Days of Gratitude From Rhonda Byrne - The Magic
95% (40)
28 Days of Gratitude From Rhonda Byrne - The Magic
33 pages
Full Beholding and Becoming: The Art of Everyday Worship Ruth Chou Simons Ebook All Chapters
100% (2)
Full Beholding and Becoming: The Art of Everyday Worship Ruth Chou Simons Ebook All Chapters
35 pages
Caryl Churchill 5394
No ratings yet
Caryl Churchill 5394
5 pages
ASHRAE Journal - Doubling-Down On NOT Balancing Variable Flow Hydronic Systems
100% (1)
ASHRAE Journal - Doubling-Down On NOT Balancing Variable Flow Hydronic Systems
6 pages
Google - Google Hack Honeypot Manual
100% (2)
Google - Google Hack Honeypot Manual
9 pages
Agri MCQ
100% (8)
Agri MCQ
75 pages
Definition Statistics 2
No ratings yet
Definition Statistics 2
3 pages
Business Statistics NOtes
No ratings yet
Business Statistics NOtes
46 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Prelim Notes
No ratings yet
Prelim Notes
4 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
Data Description Analysis
No ratings yet
Data Description Analysis
40 pages
Math in The Modern World Stat Lecture
No ratings yet
Math in The Modern World Stat Lecture
3 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
Measures of Central Tendency
100% (1)
Measures of Central Tendency
48 pages
Chapter 3
No ratings yet
Chapter 3
39 pages
Ch3 Numerically Summarizing Data
No ratings yet
Ch3 Numerically Summarizing Data
35 pages
Mmw Reviewer
No ratings yet
Mmw Reviewer
9 pages
Data Description: - Measures of Central Location - Measures of Variation - Measures of Position
No ratings yet
Data Description: - Measures of Central Location - Measures of Variation - Measures of Position
30 pages
Quant Descriptive Statistics
No ratings yet
Quant Descriptive Statistics
37 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
EECM3724_Unit_1_Ch3_slides_2022
No ratings yet
EECM3724_Unit_1_Ch3_slides_2022
48 pages
Week 5 - Result and Analysis 1 (UP)
No ratings yet
Week 5 - Result and Analysis 1 (UP)
7 pages
Goals in Statistic
100% (1)
Goals in Statistic
149 pages
المحاضرة رقم 3
No ratings yet
المحاضرة رقم 3
44 pages
Measures
No ratings yet
Measures
8 pages
PC 2 Statistics by Praveen Mathur
No ratings yet
PC 2 Statistics by Praveen Mathur
44 pages
Lecture_04
No ratings yet
Lecture_04
88 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Describing Data_Numerical Measure
No ratings yet
Describing Data_Numerical Measure
33 pages
2.3 Descriptive Numerical Summary Measures
No ratings yet
2.3 Descriptive Numerical Summary Measures
67 pages
Mathematical Analysis
100% (1)
Mathematical Analysis
46 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
Business Statistics: Measures of Central Tendency
No ratings yet
Business Statistics: Measures of Central Tendency
44 pages
Chapter 3&4 5
No ratings yet
Chapter 3&4 5
14 pages
EDA_W3_Obtaining-Data
No ratings yet
EDA_W3_Obtaining-Data
57 pages
Stat 3
No ratings yet
Stat 3
42 pages
Ch 2 Lecture Notes
No ratings yet
Ch 2 Lecture Notes
12 pages
Univariate Statistics
No ratings yet
Univariate Statistics
4 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
Measures of Central Tendency or Averages
No ratings yet
Measures of Central Tendency or Averages
9 pages
STAE lecture notes_LU3_Annotated
No ratings yet
STAE lecture notes_LU3_Annotated
10 pages
STAE Lecture Notes - LU3
No ratings yet
STAE Lecture Notes - LU3
24 pages
Descreptive Statistics 1
No ratings yet
Descreptive Statistics 1
74 pages
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
No ratings yet
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
62 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
PDF Document 3
No ratings yet
PDF Document 3
35 pages
ISM Session 1-8+webinar1,2 Merged
No ratings yet
ISM Session 1-8+webinar1,2 Merged
718 pages
L3 Numerical Summary Measures
No ratings yet
L3 Numerical Summary Measures
44 pages
UNGROUPED DATA Measures of Central Tendency, Dispersion, and Position
No ratings yet
UNGROUPED DATA Measures of Central Tendency, Dispersion, and Position
34 pages
Biostatistics (Descriptive Statistics)
No ratings yet
Biostatistics (Descriptive Statistics)
30 pages
Central Tendency
No ratings yet
Central Tendency
105 pages
Instructions For Chapter 3 Prepared by Dr. Guru-Gharana: Terminology and Conventions
No ratings yet
Instructions For Chapter 3 Prepared by Dr. Guru-Gharana: Terminology and Conventions
11 pages
Probability and Statistics Lecture Notes
100% (1)
Probability and Statistics Lecture Notes
9 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Module 3 Descriptive Statistics Numerical Measures
No ratings yet
Module 3 Descriptive Statistics Numerical Measures
28 pages
dddddd2
No ratings yet
dddddd2
5 pages
Introductory of Statistics - Chapter 3
No ratings yet
Introductory of Statistics - Chapter 3
7 pages
MCS Lecture 3
No ratings yet
MCS Lecture 3
57 pages
Data Management: Midterm
0% (1)
Data Management: Midterm
85 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Measure of Central Tendency: Measure of Location: Goals
No ratings yet
Measure of Central Tendency: Measure of Location: Goals
7 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Chapter 13(Technical English for Statistics)
No ratings yet
Chapter 13(Technical English for Statistics)
7 pages
Chapter 12(Technical English for Statistics)
No ratings yet
Chapter 12(Technical English for Statistics)
6 pages
Chapter 10(Technical English for Statistics)
No ratings yet
Chapter 10(Technical English for Statistics)
6 pages
Chapter 11(Technical English for Statistics)
No ratings yet
Chapter 11(Technical English for Statistics)
6 pages
Chapter 8(Technical English for Statistics)
No ratings yet
Chapter 8(Technical English for Statistics)
6 pages
Chapter 4(Technical English for Statistics)
No ratings yet
Chapter 4(Technical English for Statistics)
8 pages
Anali̇ze Gi̇ri̇ş II 5.Hafta Ders Notlari
No ratings yet
Anali̇ze Gi̇ri̇ş II 5.Hafta Ders Notlari
5 pages
Anali̇ze Gi̇ri̇ş II 4.Hafta Ders Notlari
No ratings yet
Anali̇ze Gi̇ri̇ş II 4.Hafta Ders Notlari
5 pages
Pembahasan Soal Listening
No ratings yet
Pembahasan Soal Listening
12 pages
REFF DATA SHEET DESKRIPSI - DPPU SULTAN ISKANDAR MUDA
No ratings yet
REFF DATA SHEET DESKRIPSI - DPPU SULTAN ISKANDAR MUDA
2 pages
English Literature Answers
No ratings yet
English Literature Answers
2 pages
DS 1 Introduction
No ratings yet
DS 1 Introduction
33 pages
Oribe Acelyn Joy DISC and Motivators
No ratings yet
Oribe Acelyn Joy DISC and Motivators
12 pages
Full Download Research Design Qualitative Quantitative and Mixed Methods Approaches 6th Edition John W. Creswell PDF DOCX
No ratings yet
Full Download Research Design Qualitative Quantitative and Mixed Methods Approaches 6th Edition John W. Creswell PDF DOCX
40 pages
Time-Controlled Interrupts With The S7-22x
No ratings yet
Time-Controlled Interrupts With The S7-22x
2 pages
UPDATE: Suspect Arrested in 68' Vandalism Case
No ratings yet
UPDATE: Suspect Arrested in 68' Vandalism Case
6 pages
OREAS 285 Certificate
No ratings yet
OREAS 285 Certificate
27 pages
Marketing Diary Ferraro Rocher PDF
No ratings yet
Marketing Diary Ferraro Rocher PDF
31 pages
17 03 2024 SR Super60 Elite, Target & LIIT BTs Jee Main
No ratings yet
17 03 2024 SR Super60 Elite, Target & LIIT BTs Jee Main
14 pages
Shield Alarm Check Valve
No ratings yet
Shield Alarm Check Valve
6 pages
Foreign Exchange Rates
No ratings yet
Foreign Exchange Rates
11 pages
Interview Questions (RNP)
No ratings yet
Interview Questions (RNP)
19 pages
jmisst-2021-00304
No ratings yet
jmisst-2021-00304
8 pages
Consumer Behavior in Sustainable Fashion A Systematic Literature Review and
No ratings yet
Consumer Behavior in Sustainable Fashion A Systematic Literature Review and
25 pages
Own & G.yule - Discourse Analysis (1988)
No ratings yet
Own & G.yule - Discourse Analysis (1988)
14 pages
BR Telegr 1881
No ratings yet
BR Telegr 1881
340 pages
Lista Charlie 27-10-2024
No ratings yet
Lista Charlie 27-10-2024
3 pages
Little Black Boy
No ratings yet
Little Black Boy
9 pages
Editing
No ratings yet
Editing
2 pages
Class 12 Computer Science Sample Paper Set 10
No ratings yet
Class 12 Computer Science Sample Paper Set 10
15 pages
Skills: Civil Engineer July 2012-Present
No ratings yet
Skills: Civil Engineer July 2012-Present
2 pages
Blockchain_Technology_in_Smart_Agriculture_Environment_A_PLS-SEM
No ratings yet
Blockchain_Technology_in_Smart_Agriculture_Environment_A_PLS-SEM
6 pages

Chapter 3(Technical English for Statistics)

Uploaded by

Chapter 3(Technical English for Statistics)

Uploaded by

Chapter 3

least where k > 1. Chebyshev's theorem can be applied to any distribution

Measures of Central Tendency

This is what people usually intend when they say "average"

Sample Mean for Frequency Distribution:

Ungrouped Frequency Distribution

Grouped Frequency Distribution

Property Mean Median Mode Midrange

Always Exists No Yes No Yes

Uses all data values Yes No No No

Affected by extreme Yes No No Yes

Average deviation defines as below:

Unbiased Estimate of the Population Variance

A little algebraic simplification returns:

, where k is an number greater than 1.

"Within k standard deviations" interprets as the interval: to .

Standard Scores (z-scores)

Percentiles, Deciles, Quartiles

Percentiles (100 regions)

1. Rank the data

Note: The 50th percentile is the median.

1. Take the number of values below the number

Deciles (10 regions)

Box and Whiskers Plot

Interquartile Range (IQR)

Q1 - 3 * IQR <= x < Q1 - 1.5 * IQR

Q1 + 1.5 * IQR < x <= Q3 + 3 * IQR

You might also like