0% found this document useful (0 votes)
99 views

Basic Statistics

1) This document discusses basic statistics concepts including data, populations, tabulation of data, frequency distribution tables, and graphs. 2) Key points covered include defining target populations, sampling populations, and units of analysis. Methods of tabulating and presenting data in tables and graphs are also explained. 3) Common graphs discussed are bar graphs, line graphs, histograms, and pie charts. Their uses and properties are defined to help summarize categorical and numerical data visually.

Uploaded by

ZAKAYO NJONY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

Basic Statistics

1) This document discusses basic statistics concepts including data, populations, tabulation of data, frequency distribution tables, and graphs. 2) Key points covered include defining target populations, sampling populations, and units of analysis. Methods of tabulating and presenting data in tables and graphs are also explained. 3) Common graphs discussed are bar graphs, line graphs, histograms, and pie charts. Their uses and properties are defined to help summarize categorical and numerical data visually.

Uploaded by

ZAKAYO NJONY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 23

BASIC STATISTICS

INTRODUCTION
What is Statistics

This course will restrict itself to descriptive statistics and specifically: measures of central
tendency, measures of related position and measures of dispersion.
We shall first consider what data is and issues revolving about it.
Data

Categories of data

1
Types of data

Methods of data collection


These includes
(a) Questionnaires – filled by respondents.
(b) Interviews schedule
(c) Observation
(d) Photographing /videoing

Populations
1. Target population
2. Accessible population
3 Sample population
4. Units – actual respondent filling questionnaire or being interviewed.

TABULATION OF DATA

Reasons for tabulating data


a) Comparison between different classes of data is made easy
b) Required data can easily be located
c) Unnecessary details can be avoided
d) Summarizes data and hence takes less space

Examples

1. In the year 2000, a certain firm employed 90 staff of whom 79 were men. During the
year, 17 staff left and 13 of these were men. The total recruitment during the year was 13
out of whom 3 were women.
In 2001 wastage declaimed by 3 among men as compared with 2000 and no woman left 6
more men but 1 fewer women were recruited than the previous year. The total number of
employees as at 1st January 2002 was 93. Tabulate the above data.

Employees of a firm
2000 2001
Men Women Total Men Women Total
Employees as at 1st January 79 11 90 76 10 86
Recruitment 10 3 13 16 1 17
Left (13) (4) (17) (10) - (10)
2
Total 776 10 86 82 11 93

2. The following report was prepared by an exams officer on the performance of students in a
certain institution out of 3,500 male candidates below 20 years of age, 500 passed, of the
1,100 male candidates 20 years and over, 900 failed. As regards the female candidates, out
of 500 candidates below 20 years of age, 400 failed, of the 340 females 20 years and above,
80 passed. Present the data in the tabular form.

Student Performance
Passed Failed Total
Below 20 years Men 500 3,000 3,500
Women 100 400 500
20 years and above: Men 200 900 1,100
Women 80 260 340
Total 880 4,560 5.440

FREQUENCY DISTRIBUTION TABLE


This is a table which shows how the total frequency is distrusted over the various values
of a variable. This is done by grouping values of a variable into class intervals.
Frequency is the number of times an observation occurs. For large data, the construction
of discrete frequency distribution is cumbersome and therefore we construct a grouped
(continuous) Frequency’s distribution where the data is distributed into categories
(classes) and the number of observations falling with each class is called the class
frequency.
To construct a continuous frequency distribution
1. Determine are range i.e difference between largest and smallest observation.
2. Select the no of classes k). usually 5≤k≤20
3. Determine the day class size i.e c = R/K
4. Determine the lower boundary/ limit such that it includes the least observation. The other
limits/boundaries are generated by adding the class size.
5. Tally the observations falling in each class to obtain the frequency distribution.

Example
1. The following marks were obtained by students in a exam
40 46 51 41 33 65 48 43 36 71
74 39 56 50 58 40 37 68 37 25
54 55 55 49 38 44 59 73 44 50
47 56 61 41 40 42 58 66 38 39

Solution

3
Marks 25 – 34 35 – 44 45 – 54 55 – 64 65 – 74
No. of students 2 16 8 8 6

2. The age a death of people in a certain village was recorded as follows:


38 68 39 55 60 61 56 49 51 35 58 48 58 47
65 50 52 39 53 43 42 51 62 47 55 58 54 52
46 65 45 55 46 42 52 34 59 53 48 48 60 52
Generate a grouped frequency distribution of class size = 5

Solution
Age 30-34 35-39 40 -44 45 – 49 50 – 54 55- 59 60 - 64 65 - 69
No. of deaths 1 4 3 9 10 8 4 3

GRAPHS

Definition of Graphs
A graph shows the relationship between two variables by means of a curve or a straight line.
Graph present data and help in the analysis and interpretation of data.

Principles of Graphs Constructions


 Should have clear and comprehensive title
 The independent variable is always on the x-axis
 The scale should always be uniform
 The graph should not be overcrowded
 Curves or lines must be distinct
 Sources of data must always be indicated, preferably at the bottom.

Types of Graphs and their uses

Graphs
Graphs are visual representations of data. They can take on many forms. They allow people to
quickly absorb information, observe trends and to easily interpolate and extrapolate data. They
are much easier to understand than a large table of numbers and if well constructed, should
provide the same amount of information as the table. In presentations, it is usually preferable to
use graphs as they convey your point quickly and without the need to understand what each and
every number in a table means.

Bar Graph
A bar graph uses bars to show data. The bars can be vertical (up and down), or horizontal
(across). The data can be in words or numbers.

4
Vertical Bar Graphs

Horizontal Bar Graphs

5
In bar graphs, the greater the height, the greater the value.

Uses:
Bar graphs are used to highlight separate quantities, especially the differences between these
quantities. They are extremely useful for comparing quantities in different categories, and can be
used to describe the relationship of several variables at once. The data typically being
represented is the number of "occurrences" measured in different categories of data. They are
used in almost every field.

Line Graph
One of the most popular types of graphs, line graphs have two axis. The horizontal (x-axis) is for
the independent variable, and the vertical axis (y-axis) is for the dependent variable. Points on
the graph are connected by lines, hence the name.

Line Graph

Uses:
Line graphs are typically used to show how a value changes over time, though the independent
variable can really be anything. In a generally sense, they are used to show how one value
changes as another changes uniformly or incrementally. They are used in almost every field.

6
Histograms
Very similar to bar graphs, except that there is no space between bars. Used to show the
frequency distribution of a continuous variable (i.e. the heights of students in a class).

Uses:
They are used a lot in statistical work and demographics.
Pie Graph/ Circle Graph

Pie graphs, in their simplest form, are circles subdivided into different coloured regions. The
greater the sliced area, the greater the category’s value.

7
Uses:
Pie charts are typically used to summarize categorical data, or even more often, percentile data.
The components have to add up to make a "whole" of sorts or else the graph becomes
meaningless (ex. Student population, market segment, etc...). A chunk may be seperated from the
rest of the pie to indicate its significance. They are used in almost every field.

GRAPHS OF FREQUENCY DISTRIBUTIONS


A frequency distribution can be presented graphically in any of the following ways.
1. Histogram
2. Frequency polygon
3. Smoothed frequency curve
4. Cumulative frequency curves or “Ogives”

1. Histogram
A histogram is a graphical method for presenting data, where the observations are located on
a horizontal axis (usually grouped into intervals) and the frequency of those observations is
depicted along the vertical axis.

The histogram should be clearly distinguished from a bar diagram. The distinction has in the
fact that whereas a bar diagram is one dimensional, i.e. only the length of the bar is material
and not the width, a histogram is two-dimensional, that is, in a histogram both the length as
well as the width are important.

EXAM
Represent the following data by a histogram.
Class size Frequency
0 – 10 5
10 – 20 11

8
20 – 30 19
30 – 40 21
40 – 50 16
50 – 60 10
60 – 70 8
70 – 80 6
80 – 90 3
90 – 100 1

Solution
The histogram of the above data is given below.

2. Frequency polygon
A frequency polygon is a group of frequency distribution. It has more than four sides. It is
particularly effective in comparing two or more frequency distributions.

Constructing Frequency Polygon


This can be done by drawing a histogram of the given data and then joining by straight lives the
mid-points of the upper horizontal side of each rectangle with the adjacent rectangle.
N/B Two hypothetical classes at each end would have to be included – each with a frequency
of zero. This extension is made with the object of making the area under polygon equal to the
area under the corresponding Histogram.

EXAMPLE
The daily profits (in thousand shillings) of 100 shops are distributed as follows:

Daily profits: 0 – 50 50 – 100 100 – 150 150 – 200 200 – 250

9
No of shops: 12 18 27 20 17

Required: Prepare a histogram and frequency polygon of the above data.

HISTOGRAM AND FREQUENCY POLYGON

3. Smoothed frequency curve


A smoothed frequency curve can be drawn through the various points of the polygon. The
curve is drawn free hand in such a manner that the area included under the curve is
approximately the same as that of the polygon.

Steps in drawing smoothed frequency curve


1. Draw a histogram.
2. Draw frequency polygon.
3. Smoothen with a free hand along the frequency polygon.

Example

10
Draw a histogram, frequency polygon and frequency curve representing the following figures.

Length of service No of employees


(in years)
Less than 5 5
5 – 10 12
10 – 15 25
15 – 20 48
20 – 25 32
25 – 30 6
30 – 35 1

Solution
HISTOGRAM, FREQUENCY POLYGON AND CURVE

60
50
40
30
20
10
0
5 10 15 20 25 30 35

LENGTH OF SERVICE

4. Cumulative frequency curves or “ogives”

Cumulative frequencies- refer to the summation of frequencies. These frequencies are then listed
in a table called a cumulative frequency table. The graph of such a distribution is called a
cumulative frequency curve or an ogive:

Example
The table below shows the distribution in weights of 75 pigs

11
Weights (kg) Frequency
10 – 20 1
20 – 30 7
30 – 40 8
40 – 50 11
50 – 60 19
60 – 70 10
70 – 80 7
80 – 90 5
90 – 100 4
100 – 110 3

Required: Draw cumulative frequency curve

Solution

Weight (Kg) Frequency Cumulative Frequency

10 – 20 1 1
20 – 30 7 8
30 – 40 8 16
40 – 50 11 27
50 – 60 19 46
60 – 70 10 56
70 – 80 7 63
80 – 90 5 68
90 – 100 4 72
100 – 110 3 75

12
CUMULATIVE FREQUENCY CURVE

HOMERK

The table below shows the number of people killed in road accidents by different vehicles.

Type of Vehicle Lorries Cars Buses Matatus Others


Number of 10 40 22 16 12
people

Construct a pie chart to represent the data

Measures of Central Tendency


THE MEAN
a) Arithmetic mean(A.M)
- the commonly used average
- A.M = ∑x = ∑x
∑f N

Example
Find the arithmetic mean for the following data:
2, 4, 6, 8, 10

Therefore A.M. = ∑x = 2 + 4 + 6 + 8 + 10 = 30 =6
∑f 5 5

b) Geometric mean (G.M) = N x1 .x 2 .x3 ....x n


Given : 1, 2, 3, 4, 5, 6

13
G.M. = 6
1.2.3.4.5.6 = 6 720 =2.99

c) Harmonic mean (H.M)

H.M = N N
( 1/ x1 + 1/ x2 + …….. 1/ xn)

Given the data : 2,3,4,5,6

H. M = 5 5
( ½ + ⅓ + ¼ + 1/5 + 1/6

= 5 5
30 +20 + 15 + 12 + 10
60

= 5 x 60
147
= 300
147
= 2.448

THE MEDIAN
a) Discrete data
Median is the middle value when data is arranged in ascending or descending order.
Th
N 1
Median=   observation
 2 
Th
 5 1
e.g 1,2,3,,4,5, Median=   =3rd observation=3
 2 

b) Grouped data
When determining the particular class in which the value of median lies use N/2 to locate the
Th
N 1
median class and NOT   because in the use of grouped data, it is N/2 which divides
 2 
the area of the curve into two equal parts.

N / 2  p.c. f
 Median=L + f
xi
where L = lower class limit of the median class

14
p.c.f= preceding cumulative frequency to the median class
f= frequency of the median class
i = the class- interval of the median class

Example
1,500 workers are working in an industrial establishment. Their age is classified as follows:

Age (yrs) No of workers


18 – 22 120
22 – 26 125
26 – 30 280
30 – 34 260
34 – 38 155
38 – 42 184
42 – 46 162
46 – 50 86
50 – 54 75
54 – 58 53

Required: Calculate the median age

Solutions
Age group F C.F
18 – 22 120 120
22 – 26 125 245
26 – 30 280 525
30 – 34 260 785
34 – 38 155 940
38 – 42 184 1,124
42 – 46 162 1,286
46 – 50 86 1,372
50 – 54 75 1,447
54 – 58 53 1,500
Median = Nth observation
2
= 1,500/2 = 750th observation

Hence the median lies in the class 30 – 34


Median = L + N/2 – P.c.f x i

15
f
= 30 + 750 – 525 x 4
260
= 30 + 3.46
= 33.46
Hence the median age of the workers is 33.46 years.

Related Measures of Position


Besides median, there are other measures which divide a series into a equal number of parts.
Important amongst these are quartiles, deciles and percentiles.

Quartiles – are those values of the variate which divide the total frequency into four
equal parts.

Deciles – divides the total frequency into 10 equal parts.

Percentiles – divide the total frequency in 100 equal parts.

The quartiles are denoted by symbol Q, deciles by D and percentiles by P.

Computation of Quartiles, Deciles and Percentiles


The procedure for computing quartiles, and deciles, is the same as for median. For grouped data,
the following formula are used for quartiles, deciles and percentiles.

iN
 p.c. f
Qi = L + 4 xi for i= 1, 2, 3
f

where L = lower class limit of the respective quartile class


p.c.f= preceding cumulative frequency to the respective quartile class
f= frequency of the respective quartile class
i = the class- interval of the respective quartile class

kN
 p.c. f
Dk = L + 10 xi for k= 1, 2, 3,….9
f
where L = lower class limit of the respective decile class
p.c.f= preceding cumulative frequency to the respective decile class
f= frequency of the respective decile class
i = the class- interval of the respective decile class

16
LN
 p.c. f
PL = L + 100 xi for k= 1, 2, 3,….9
f
where L = lower class limit of the respective percentile class
p.c.f= preceding cumulative frequency to the respective percentile class
f= frequency of the respective percentile class
i = the class- interval of the respective percentile class

Example
In an examination of 675 candidates the examiner supplied the following information
Marks obtained (in %) No of candidates
0 – 10 7
10 – 20 32
20 – 30 56
30 – 40 106
40 – 50 180
50 – 60 164
60 – 70 86
70 – 80 44

Calculate Q1 and median

Solution
Marks (in %) No of candidates (f) C.F
0 – 10 7 7
10 – 20 32 39
20 – 30 56 95
30 – 40 106 201
40 – 50 180 381
50 – 60 164 545
60 – 70 86 631
70 – 80 44 675
Q1 = Size of Nth observation = 675th = 168.75th observation
4 4
Q1 lies in the class 30 – 40

17
N
 p.c. f 168.75  95
Therefore Q1 = L + 4 x i = 30 + x 10
106
f

= 30 + 6.96
= 36.96

Thus 25% of the students scored 36.96% or less

Median = Q2 = size of Nth observation = 675/2th observation=337.5th observation


2
Median lies in the class 40-50
N
 p.c. f
Median = Q2 = L + 2 x I = 40+(337.5-201)/180 x 10 =47.5833
f

Exercise: use the above example to Calculate D4 and P80 and interpret the values

THE MODE

Grouped Data
1
M0 = L + xi
1   2

L = lower limit of the model class


1 = The difference between the frequency of the model class and frequency of the pre-model
class i.e., preceding class.

 2 = The difference between the frequency of the model class and frequency of the post-
model class i.e., succeeding class.
i = The size of the model class

18
Measures of variation
A good measure of variation should posses, as far as possible, the following properties.
It should be:
(i) Simple to understand
(ii) Easy to compute
(iii) Rigidly defined
(iv) Based on each and every observation of the distribution.
(v) Unduly affected by extreme observations.

Methods of studying variations


1. The range
2. The inter quartile range or Quartile deviation.
3. The average deviation
4 The standard deviation
5 The Lorenz curve

Of these, the first four are mathematical methods and the last is a graphical one.

Range = L – S where L = Longest


S = smallest

Inter quartile range = Q3 – Q1

Semi – inter quartile range or quartile deviation = Q.D = Q3 – Q1


2

19
Standard Deviation

  x  x  fx   fx 
2 2
2
Standard Deviation s= or s=  
N f f 
 

Example

Find the standard deviation from the weekly wages of ten workers working in a factory.

Workers Weekly wages (KShs)


A 1320
B 1310
C 1313
D 1322
E 1326
F 1340
G 1325
H 1321
I 1320
J 1331

x =1323 and s = 622 =7.89


10

Example

A) We first find the mean of 1,5,4,2,8 before we compute variance and standard deviation

20
B) We make a table to assist in computation of the required measures.

21
Exercise I
1. The following distribution gives the pattern of overtime work per week done by 100
employees of a company. Calculate median, first quartile (Q1) and 7th decile (D7)

Overtime hours 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 35 – 40
No. of employees 11 20 35 20 8 6

Answers: Median = 22.714


Q1 = 18.5
D7 = 26

Exercise II
A certain disease affects children and sometimes kills them. The data below shows the age (in
months) and death of children dying from the disease.
84 91 58 72 44 67 76 43 83 40 73 86 77 75 71
43 33 78 94 65 74 50 65 80 57 73 36 33 91 53
63 59 46 47 37 11 82 40 27 84 53 19 35 72 44
19 51 67 58 76 38 16 74 46 50 18 59 27 92 13
45 61 86 39 78 23 12 71 62 22 38 27 66 51 79

22
47 39 19 22 35 39 80 37 55 29 37 41 73 54 63

a) Construct a frequency distribution table with interval 10-19, 20-29……..


b) Estimate the mean, mode and median age at death and comment on your results

c) If this was a sample, what conclusive would you make about the impact of the disease.
d) Plot on the same axes the polygon superimposed on histogram

23

You might also like