0% found this document useful (0 votes)
3 views

Normal DistrCent Tendency Measures of Dispersion

Central tendency measures

Uploaded by

Arvind Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Normal DistrCent Tendency Measures of Dispersion

Central tendency measures

Uploaded by

Arvind Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Normal Distribution

Measures of Central Tendency


Measures of Dispersion
Standard Normal Variate (SNV). It is a measure of location
that measures the number of SDs from the mean for a dataset.
A normal frequency curve can be described completely with the
mean and SD values. We may have different normal curves
with different units of measurements.
To eliminate the effect produced by the choice of units of
measurement, the data can be put in the unit free form or the data
can be normalized.
The first step to transform the original variable to normalized
variable is to calculate the mean and SD.
 The normalized values are then calculated by subtracting mean
from individual values and dividing by SD.

These normalized values are also called the z values:


The distribution of z always follows normal distribution
with mean of 0 and SD of 1. The z values are often
called
the SNV or unit normal distribution.
Standard Normal Distribution
Standard Normal Distribution

0.40 .34

0.30
.50 .135
0.20

0.10
.025
0.00
-4 -3 -2 -1 0 1 2 3 4
Standard Score (z)
Normal Distribution Problems
• Suppose the NEET exam of 720 marks has a
mean score of 500 and a standard deviation of
100.
• Ram wants to be accepted to a med course at
AIIMS that requires that applicants score at or
above the 84th percentile. In other words,
Ram must be among the top 16% to be
admitted.
• What score does Ram need on the test?
To solve these problems, start by drawing
the standard normal distribution.
Next, formula for z:
Standard Normal Distribution

.34
X i  X X i  500
.50 zX  
.135 +
i
sX 100
.025 =
.16

-4 -3 -2 -1 0 1 2 3 4
Standard Score (z)
Next: Label the Landmarks

zX –2 –1 0 1 2

X 300 400 500 600 700


Now Check the Normal Areas
• We now know that:
• 2.5% score below 300; i.e., z = –2
• 16% score below 400; ie z= -1
• 50% score below 500; i.e., z = 0
• 84% score below 600 ie , z= = +1
• 97.5% score below 700; i.e., z = +2
Solution
• Ram had to be among the top 16% to be
accepted.
• That means his z-score must be +1.
• Thus, his raw score must be at least 600,
which is one standard deviation (100) above
the mean (500).
• Therefore, Ram needs to score at least 600.
Measure of central tendency-
Mean
• Mean or average is the central value of the series obtained
by dividing the sum of all values by the number of
observations. It is denoted as x.
 based on all the observations and is easy to understand.
 It is also least affected by the fluctuations of the sampling and is
sensitive to changes.
 loss of even a single observation makes it impossible to calculate
the mean;
 in case of extreme observations, it is not a representative measure
 most appropriate measure for representing data in the case of
normal distribution of observations, but not for skewed
distributions
Median
• Locative or positional measure, which is the middle-
most observation after all the values are arranged in
an ascending or a descending order.
• It is that value which divides the data into two equal
parts of 50% each, i.e., 50% of observations lie above
the median value and 50% lie below the median value.
 The median is unique for a given set of data
 it is more affected by the fluctuations of sampling compared
to mean.
 it is not affected by extreme observations
Mode
• It is the most frequently occurring observation in the
dataset. Value that repeats itself the maximum
number of times in the dataset
• Mode is very easy to understand and calculate
• it does not depend on all the observations for its
calculation.
• It is more likely to be affected by fluctuations of
sampling than mean and median.
• At times, we may have bimodal frequency
distributions or it may also happen that there is no
mode existing for the dataset.
Relationships Between the Three Measures of Mean, Median, and
Mode

• For symmetric curve: Mean = median = mode


• For symmetric curve: Mean - mode » 3 (mean
- median)
• positively skewed curve: Mean > median > mode
• negatively skewed curve: Mean < median < mode
Choice of the Measure of Central Tendency.

• Data quantitative nature and symmetric or approximately


symmetric, measure used is arithmetic mean.
• If the values in the series are such that one or two
observations are very big or very small compared to other
observations, In such cases (skewed data), median would
give better results.
• In social and psychological studies that deal with scored
observations or data that cannot be directly measured
quantitatively, for example, socioeconomic status,
intelligence, or pain score, median or mode is a better
measure than mean.
Measures of Variability/Dispersion

• In contrast to measures of central tendency, which


describe the center of the dataset, measures of
variability describe the variability or spread of the
observations from the center of the data.
• The dispersion would be small if the values are
close to one another indicating compactness,
consistency, and reliability of the data collected,
whereas a higher value of dispersion indicates
that the values are widely spread out.
Measures of Dispersion
1.Range
2.Interquartile range
3.Mean deviation
4.Standard deviation (SD) -RMSD
5.Coefficient of variation
Range
• Range = Maximum value - Minimum value
• Uses only extreme observations and ignores
the rest.
• Easy to calculate,
• Affected by the fluctuations of sampling.
• It gives a rough idea of the dispersion of the
data, but is not useful in statistical analysis.
Interquartile Range
• Difference in the values of the two extreme
quartiles, i.e., interquartile range =Q3 - Q1.
• Quartile values formula:
Q1 = N/4, Q2 = 2 × (N/4), Q3 = 3 × (N/4), and Q4 = 4 × (N/4).
The interquartile range gives the middle 50%
values of the dataset.
Although interquartile range is easy to calculate,
it suffers from the same defects as those of range.
Mean Deviation (From mean/median)
• Mean deviation is the mean of the difference
from a constant “A,” which can be taken as
mean, median, mode, or any constant
observation from the data.

• The main drawback of this measure is that it


ignores the algebraic signs. This drawback is
overcome in another measure of variability,
called variance.
Standard Deviation (RMSD)
• Variance-

• Most often we use the square root of the variance,


called SD, to describe the data as it is devoid of any
errors. Variance squares the units and SD, by taking
the square root, brings the measure back to the
same units as the original and, hence, is the best
measure of variability. It is given as follows:

Coefficient of Variation (CoV)
• compares the variability in two datasets.
• It is a measurement of relative dispersion.
• It measures the variability relative to the
mean and is calculated as follows-
Problem
• Example 5: Calculate variance and SD for the
following data on serum cholesterol levels of
10 patients -192, 242, 203, 212, 175, 284, 256,
218, 182, 228
(mean = 219.2)
• Replace n by n - 1 if observations are less than 30.
QUIZ

Same Mean, Different SD

Same SD Different Mean


Further Reading/Home Work
• Central Limit Theorem (CLT).

You might also like