Basic Statistics
Basic Statistics
INTRODUCTION
What is Statistics
This course will restrict itself to descriptive statistics and specifically: measures of central
tendency, measures of related position and measures of dispersion.
We shall first consider what data is and issues revolving about it.
Data
Categories of data
1
Types of data
Populations
1. Target population
2. Accessible population
3 Sample population
4. Units – actual respondent filling questionnaire or being interviewed.
TABULATION OF DATA
Examples
1. In the year 2000, a certain firm employed 90 staff of whom 79 were men. During the
year, 17 staff left and 13 of these were men. The total recruitment during the year was 13
out of whom 3 were women.
In 2001 wastage declaimed by 3 among men as compared with 2000 and no woman left 6
more men but 1 fewer women were recruited than the previous year. The total number of
employees as at 1st January 2002 was 93. Tabulate the above data.
Employees of a firm
2000 2001
Men Women Total Men Women Total
Employees as at 1st January 79 11 90 76 10 86
Recruitment 10 3 13 16 1 17
Left (13) (4) (17) (10) - (10)
2
Total 776 10 86 82 11 93
2. The following report was prepared by an exams officer on the performance of students in a
certain institution out of 3,500 male candidates below 20 years of age, 500 passed, of the
1,100 male candidates 20 years and over, 900 failed. As regards the female candidates, out
of 500 candidates below 20 years of age, 400 failed, of the 340 females 20 years and above,
80 passed. Present the data in the tabular form.
Student Performance
Passed Failed Total
Below 20 years Men 500 3,000 3,500
Women 100 400 500
20 years and above: Men 200 900 1,100
Women 80 260 340
Total 880 4,560 5.440
Example
1. The following marks were obtained by students in a exam
40 46 51 41 33 65 48 43 36 71
74 39 56 50 58 40 37 68 37 25
54 55 55 49 38 44 59 73 44 50
47 56 61 41 40 42 58 66 38 39
Solution
3
Marks 25 – 34 35 – 44 45 – 54 55 – 64 65 – 74
No. of students 2 16 8 8 6
Solution
Age 30-34 35-39 40 -44 45 – 49 50 – 54 55- 59 60 - 64 65 - 69
No. of deaths 1 4 3 9 10 8 4 3
GRAPHS
Definition of Graphs
A graph shows the relationship between two variables by means of a curve or a straight line.
Graph present data and help in the analysis and interpretation of data.
Graphs
Graphs are visual representations of data. They can take on many forms. They allow people to
quickly absorb information, observe trends and to easily interpolate and extrapolate data. They
are much easier to understand than a large table of numbers and if well constructed, should
provide the same amount of information as the table. In presentations, it is usually preferable to
use graphs as they convey your point quickly and without the need to understand what each and
every number in a table means.
Bar Graph
A bar graph uses bars to show data. The bars can be vertical (up and down), or horizontal
(across). The data can be in words or numbers.
4
Vertical Bar Graphs
5
In bar graphs, the greater the height, the greater the value.
Uses:
Bar graphs are used to highlight separate quantities, especially the differences between these
quantities. They are extremely useful for comparing quantities in different categories, and can be
used to describe the relationship of several variables at once. The data typically being
represented is the number of "occurrences" measured in different categories of data. They are
used in almost every field.
Line Graph
One of the most popular types of graphs, line graphs have two axis. The horizontal (x-axis) is for
the independent variable, and the vertical axis (y-axis) is for the dependent variable. Points on
the graph are connected by lines, hence the name.
Line Graph
Uses:
Line graphs are typically used to show how a value changes over time, though the independent
variable can really be anything. In a generally sense, they are used to show how one value
changes as another changes uniformly or incrementally. They are used in almost every field.
6
Histograms
Very similar to bar graphs, except that there is no space between bars. Used to show the
frequency distribution of a continuous variable (i.e. the heights of students in a class).
Uses:
They are used a lot in statistical work and demographics.
Pie Graph/ Circle Graph
Pie graphs, in their simplest form, are circles subdivided into different coloured regions. The
greater the sliced area, the greater the category’s value.
7
Uses:
Pie charts are typically used to summarize categorical data, or even more often, percentile data.
The components have to add up to make a "whole" of sorts or else the graph becomes
meaningless (ex. Student population, market segment, etc...). A chunk may be seperated from the
rest of the pie to indicate its significance. They are used in almost every field.
1. Histogram
A histogram is a graphical method for presenting data, where the observations are located on
a horizontal axis (usually grouped into intervals) and the frequency of those observations is
depicted along the vertical axis.
The histogram should be clearly distinguished from a bar diagram. The distinction has in the
fact that whereas a bar diagram is one dimensional, i.e. only the length of the bar is material
and not the width, a histogram is two-dimensional, that is, in a histogram both the length as
well as the width are important.
EXAM
Represent the following data by a histogram.
Class size Frequency
0 – 10 5
10 – 20 11
8
20 – 30 19
30 – 40 21
40 – 50 16
50 – 60 10
60 – 70 8
70 – 80 6
80 – 90 3
90 – 100 1
Solution
The histogram of the above data is given below.
2. Frequency polygon
A frequency polygon is a group of frequency distribution. It has more than four sides. It is
particularly effective in comparing two or more frequency distributions.
EXAMPLE
The daily profits (in thousand shillings) of 100 shops are distributed as follows:
9
No of shops: 12 18 27 20 17
Example
10
Draw a histogram, frequency polygon and frequency curve representing the following figures.
Solution
HISTOGRAM, FREQUENCY POLYGON AND CURVE
60
50
40
30
20
10
0
5 10 15 20 25 30 35
LENGTH OF SERVICE
Cumulative frequencies- refer to the summation of frequencies. These frequencies are then listed
in a table called a cumulative frequency table. The graph of such a distribution is called a
cumulative frequency curve or an ogive:
Example
The table below shows the distribution in weights of 75 pigs
11
Weights (kg) Frequency
10 – 20 1
20 – 30 7
30 – 40 8
40 – 50 11
50 – 60 19
60 – 70 10
70 – 80 7
80 – 90 5
90 – 100 4
100 – 110 3
Solution
10 – 20 1 1
20 – 30 7 8
30 – 40 8 16
40 – 50 11 27
50 – 60 19 46
60 – 70 10 56
70 – 80 7 63
80 – 90 5 68
90 – 100 4 72
100 – 110 3 75
12
CUMULATIVE FREQUENCY CURVE
HOMERK
The table below shows the number of people killed in road accidents by different vehicles.
Example
Find the arithmetic mean for the following data:
2, 4, 6, 8, 10
Therefore A.M. = ∑x = 2 + 4 + 6 + 8 + 10 = 30 =6
∑f 5 5
13
G.M. = 6
1.2.3.4.5.6 = 6 720 =2.99
H.M = N N
( 1/ x1 + 1/ x2 + …….. 1/ xn)
H. M = 5 5
( ½ + ⅓ + ¼ + 1/5 + 1/6
= 5 5
30 +20 + 15 + 12 + 10
60
= 5 x 60
147
= 300
147
= 2.448
THE MEDIAN
a) Discrete data
Median is the middle value when data is arranged in ascending or descending order.
Th
N 1
Median= observation
2
Th
5 1
e.g 1,2,3,,4,5, Median= =3rd observation=3
2
b) Grouped data
When determining the particular class in which the value of median lies use N/2 to locate the
Th
N 1
median class and NOT because in the use of grouped data, it is N/2 which divides
2
the area of the curve into two equal parts.
N / 2 p.c. f
Median=L + f
xi
where L = lower class limit of the median class
14
p.c.f= preceding cumulative frequency to the median class
f= frequency of the median class
i = the class- interval of the median class
Example
1,500 workers are working in an industrial establishment. Their age is classified as follows:
Solutions
Age group F C.F
18 – 22 120 120
22 – 26 125 245
26 – 30 280 525
30 – 34 260 785
34 – 38 155 940
38 – 42 184 1,124
42 – 46 162 1,286
46 – 50 86 1,372
50 – 54 75 1,447
54 – 58 53 1,500
Median = Nth observation
2
= 1,500/2 = 750th observation
15
f
= 30 + 750 – 525 x 4
260
= 30 + 3.46
= 33.46
Hence the median age of the workers is 33.46 years.
Quartiles – are those values of the variate which divide the total frequency into four
equal parts.
iN
p.c. f
Qi = L + 4 xi for i= 1, 2, 3
f
kN
p.c. f
Dk = L + 10 xi for k= 1, 2, 3,….9
f
where L = lower class limit of the respective decile class
p.c.f= preceding cumulative frequency to the respective decile class
f= frequency of the respective decile class
i = the class- interval of the respective decile class
16
LN
p.c. f
PL = L + 100 xi for k= 1, 2, 3,….9
f
where L = lower class limit of the respective percentile class
p.c.f= preceding cumulative frequency to the respective percentile class
f= frequency of the respective percentile class
i = the class- interval of the respective percentile class
Example
In an examination of 675 candidates the examiner supplied the following information
Marks obtained (in %) No of candidates
0 – 10 7
10 – 20 32
20 – 30 56
30 – 40 106
40 – 50 180
50 – 60 164
60 – 70 86
70 – 80 44
Solution
Marks (in %) No of candidates (f) C.F
0 – 10 7 7
10 – 20 32 39
20 – 30 56 95
30 – 40 106 201
40 – 50 180 381
50 – 60 164 545
60 – 70 86 631
70 – 80 44 675
Q1 = Size of Nth observation = 675th = 168.75th observation
4 4
Q1 lies in the class 30 – 40
17
N
p.c. f 168.75 95
Therefore Q1 = L + 4 x i = 30 + x 10
106
f
= 30 + 6.96
= 36.96
Exercise: use the above example to Calculate D4 and P80 and interpret the values
THE MODE
Grouped Data
1
M0 = L + xi
1 2
2 = The difference between the frequency of the model class and frequency of the post-
model class i.e., succeeding class.
i = The size of the model class
18
Measures of variation
A good measure of variation should posses, as far as possible, the following properties.
It should be:
(i) Simple to understand
(ii) Easy to compute
(iii) Rigidly defined
(iv) Based on each and every observation of the distribution.
(v) Unduly affected by extreme observations.
Of these, the first four are mathematical methods and the last is a graphical one.
19
Standard Deviation
x x fx fx
2 2
2
Standard Deviation s= or s=
N f f
Example
Find the standard deviation from the weekly wages of ten workers working in a factory.
Example
A) We first find the mean of 1,5,4,2,8 before we compute variance and standard deviation
20
B) We make a table to assist in computation of the required measures.
21
Exercise I
1. The following distribution gives the pattern of overtime work per week done by 100
employees of a company. Calculate median, first quartile (Q1) and 7th decile (D7)
Overtime hours 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 35 – 40
No. of employees 11 20 35 20 8 6
Exercise II
A certain disease affects children and sometimes kills them. The data below shows the age (in
months) and death of children dying from the disease.
84 91 58 72 44 67 76 43 83 40 73 86 77 75 71
43 33 78 94 65 74 50 65 80 57 73 36 33 91 53
63 59 46 47 37 11 82 40 27 84 53 19 35 72 44
19 51 67 58 76 38 16 74 46 50 18 59 27 92 13
45 61 86 39 78 23 12 71 62 22 38 27 66 51 79
22
47 39 19 22 35 39 80 37 55 29 37 41 73 54 63
c) If this was a sample, what conclusive would you make about the impact of the disease.
d) Plot on the same axes the polygon superimposed on histogram
23