0% found this document useful (0 votes)

50 views

Bio Statistics For Medical Students

The document provides lecture notes on basic biostatistics. It defines biostatistics as the application of statistical methods to biological phenomena. The notes cover topics such as uses of biostatistics, general steps in a research process, population and sampling, scales of measurement, variables, and systems for collecting data. Examples are provided to illustrate key concepts.

Uploaded by

OPIMA ALBERT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Bio Statistics For Medical Students

Uploaded by

OPIMA ALBERT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 208

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/339499419

Lecture notes on Biostatistics.

Book · February 2020

CITATIONS READS

0 91,539

1 author :

Hamze ALI Abdillahi Medical

lecturer.

22 PUBLICATIONS 1 CITATION

SEE PROFILE
All content following this page was uploaded by Hamze ALI Abdillahi on 26 February 2020.

The user has requested enhancement of the downloaded file.

Dr-Hamze ALI ABDILLAHI

GOLLIS UNIVERSITY -ERIGAVO

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 1

Basic biostatistics

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 2

Introduction
•Statistics:
A field of study concerned with the
collection, organization and summarization
of data, and the drawing of inferences
about a body of data when only part of the
data are observed.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 3
•Biostatistics:
An application of statistical
method to biological phenomena.
The science of assembling and
interpreting numerical data
(Bland 2000)

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 4

The discipline concerned with the
treatment of numerical data
derived from groups of individuals
(Armitage et al.,2001)
Uses of Biostatistics

•Hospital utility statistics

•Resource allocation
•Vaccination uptake
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 5
•Magnitudes of a disease/condition
•Assessing risk factors
Disease frequency
•Making diagnosis and choosing an
appropriate treatment (implicit/probability).

Statistics can be used to:

1. Draw conclusions
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 6
2. Make predictions about
what will happen in other
subjects
Examples
1) At Hargeisa general hospital, 5% of
the patients were diagnosed with
DM last year

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 7

2. Kat chewers are 3 times more likely
to have MI than non-chewers
3. Antibiotics reduce the duration of
viral throat infections by 1-2 days
Medical research vs. Clinical
Practice

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 8

• Data are collected • Data are collected
from individual from individual
subjects subjects

•
• Aim is to be able to Interested in the
make some general particular subjects
statements about a wider set of subjects
that have been studied

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 9

General steps in a research process
What does Biostatistics cover?

1. Planning
2. Design
3. Data collection
4. Data Processing
5. Data Presentation
6. Data Analysis
By Dr. HAMZE ALI ABDILLAHI 10
7. Interpretation
2/26/2018 8. Publication
Population & Sample
• Population: is a complete set of items or
subjects which can be studied
 Target population: A collection of items that
have something in common for which we
wish to draw conclusions at a particular time.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 11

 Study Population: The specific population
from which data are collected.
 Sample: A subset of the study population.
(A smaller part of that population)
Generalizability:

is a two-stage procedure: we
want to generalize conclusions
from the sample to the study
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 12
population and then from the
study population to the target
population.
example
In a study of the prevalence of Kat chewing
among secondary students in Somalia a
random sample of Secondary students in
Hargeisa were taken.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 13

Target Population: All secondary students in
Somalia
Study population: All secondary students in
Somaliland
Sample: secondary students in Hargeisa

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 14

Sample

Study
population

Target
population

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 15

Parameter:
A descriptive measure computed from
the data of a population. (Quantity
calculated from population). E.g. mean
serum glucose of the population is 100mg/dl
Statistic:
A descriptive measure computed from
the data of a sample. ( Quantity
calculated from the sample). E.g. mean
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 16
serum glucose of the sample is 110mg/dl
Scales of measurement (types of
data)
• Clearly not all measurements are the
same.
• Measuring an individuals weight is
qualitatively different from measuring
their response to some treatment on a

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 17

three category of scale, “improved”,
“stable”, “not improved”.
• Measuring scales are different
according to the degree of precision
involved.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 18

Types of scales of measurement.
There are four types of scales of measurement:A.
QUALITATIVE DATA:
1. Nominal scale: (can not be ordered)
uses names, labels, or symbols to assign
each measurement to one of a limited
number of categories that cannot be
ordered.
Examples:

By Dr. HAMZE ALI ABDILLAHI 19

Blood type (A/B/AB/O) sex (Male/female) race
(Somali/ Oromo) marital status ( married/not
married/ divorced). If there are only two possible
categories the data is said to be Dichotomous ( e.g.
Sex, male/female.
2/26/2018

= mild, moderate, severe

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 21
•Socio-economic status

= upper, middle, lower

B. QUANTITATIVE DATA: ( Numerical
data).
Continuous data:
• Interval scale
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 22
• Ratio scale
• Discrete ( numbers )
3. Interval scale (equally spaced intervals):
assigns each measurement to one of an
unlimited number of categories that are
equally spaced. It has no true zero point.
Example:
body temperature measured on Celsius
or Fahrenheit, heart rate measured per
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 23
second. Thus the difference of interval
between 5kg and 10kg is same as that
between 20kg and 25kg.
These kind of measurement can be
converted into dichotomous nominal
scale e.g. afebrile (oral temp < 37)
febrile (>37) also can be ordered (ordinal
scale).
4.Ratio scale: measurement
begins at a true zero point and the
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 24
scale has equal space. Ratio data is
similar to interval scales but it is
the ratio of two measurements
and also have a true zero.
Examples: Height per weight,
blood pressure.
5. Discrete data: (numbers)

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 25

All values are clearly separated from
each other, although numbers are
used.

Examples: number ofsurgery

operations performed in one month.
Number ofnewly diagnosed psychiatric
patients last year.
Variables
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 26
•Variable: A characteristic which takes different
values in different persons, places, or things.
•Qualitative variable: The notion of magnitude is
absent or implicit.
•Quantitative variable: Variable that has
magnitude.
•Discrete variable: It can only have a finite number
of values in any given interval.
•Continuous variable: It can have an infinite
number of possible values in any given interval.
Data
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 27
The term DATA refers to (Items of
information)
Systems for collecting data
1.Regular system (routine data collecting
system): Registration of events as they
become available.
2.Ad hoc system (non-routine): A form of
survey to collect information that is not
available on a regular basis.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 28
Examples;
1. Routine system:
• Census: enumeration of all individuals in a country on a
fixed day.
• Vital registrations: birth, deaths, marriage, divorce,
ete.
• Disease notification: international notification, like
cholera, national notification like polio, cholera,
hepatitis = notification is from district level to national
level to international level.
• Disease registry: TB, cancer, stroke, birth defects
• Medical records: schools, colleges, industries
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 29
• Hospital records
• Environmental health records
2. Non-routine
1. Disease surveillance: Polio, malaria, AIDS= it is
important for control, prevention and
eradication.
2. Surveys: nutritional status by interviewing
examination or postal enquiry based.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 30

3. Social schemes: medical insurance, sickness
absenteeism, disability benefits, welfare
schemes
4. Economic data: Consumption of goods, export
and import, drugs, employment = helps panning
commission for formulation of health policies
5. Demographic data: population movement,
major epidemics
source of data
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 31
1.Primary data: collected from the
items or individual respondents directly
for the purpose of certain study.

2.Secondary data: which had been

collected by certain people or agency,
and statistically treated and the
information contained in it is used for
other purpose.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 32

Biostatistics
methods of summarizing and displaying data

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 33

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 34
Biostatistics
Presenting qualitative data

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 35

Charts and tables used to present qualitative data

1. Pie charts
2. Bar charts (simple and clustered bar charts)
3. Relative frequency (percentage) table

These two charts are used for presentation of qualitative

data.
Pie charts
Pie charts are typically used to present the relative
frequency of qualitative data.

By Dr. HAMZE ALI ABDILLAHI 36

In most cases the data are nominal, but ordinal data can
also be displayed in a pie chart.
2/26/2018

The complete circle represents the total

number of measurements.
Partition into slices - one for each
category.
The size of a slice is proportional to the
relative frequency of that category.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 37

Determine the angle of each slice by
multiplying the relative frequency by 360
degree. (Recall a circle spans 360)
Steps to create a pie-chart

 Construct a frequency table

 Calculate relative frequency %
( percentage )
 Change the percentages into degrees,
where: degree = Percentage X 360o.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 38
Draw a circle and divide it
accordingly For single variable:
For example in a class of 40 students, 15 are
boys and 25 are girls. (See the pie chart)
Frequency: number of times that something occurs.
Relative frequency = frequency divide by sum of all
frequencies

Frequency
Relative frequency = ----------------
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 39
Sum of all frequencies

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 40

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 41
Angle computations:
Since a circle has 360 degrees, the
degree measure of the sector for the
category will be:
0.375*360 = 135 0.625*360
= 225
Total = 360
Bar Chart (Bar Graph):
 Place categories on the horizontal axis.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 42
 Place frequency (or relative frequency)
on the vertical axis.
 Construct vertical bars of equal width,
one for each category.
Its height is proportional to the frequency
(or relative frequency) of the category.
Simple bar chart

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 43

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 44
Two variables (cross tabulation)
Cross tabulation or cross tabs are often used
in presenting the counts of two qualitative
variables.
Suppose the variables of Wearing Total
interest are : spectacles

yes No

• Gender and Boy 5 10 15

Girls 10 15 25
• wearing spectacles. Total 15 25 40
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 45
The are presented in this table.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 46

Two variables (qualitative)
We cross tabulation
Wearing spectacles Total
yes No

Boy 5 10 15
Girls 10 15 25
47
Total 15 25 40
2/26/2018 By Dr. HAMZE ALI ABDILLAHI

Wearing spectacles Total

yes No
Boy 33.33% 66.67% 100%

Girls 40% 60% 100%

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 48

Total 37.50% 62.50% 100%

Table showing the percentage of Gender and

wearing spectacles.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 49

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 50
Crosstabs and clustered bar
chart

Expressed in percentage. 33.33%

of the boys and 40% of the girls
wear spectacles

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 51

Calculate the percentages
Smoking Lung cancer Total

YES NO

YES 70 100

NO 3 70

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 52

BIOSTATISTICS
Methods of Displaying and
Summarizing quantitative data

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 53

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 54
Frequencies and frequency distribution tables:

Frequency distribution: is a table showing a

listing of all observed values of the variable
being studied and how many times each value
is observed.
The number of times that something occurs is
known as its frequency.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 55

The notation fx is used to denote the frequency or
number of times the value x occurs.

The relative frequency is just the frequency

divided by the sample size n.
Table: obtaining frequency, cumulative frequency and percentage
Age Frequency Cumulative Relative Cumulative relative
frequency Frequency % frequency %

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 56

13 1 1 3 3
14 7 8 23 26
15 5 13 17 43
16 6 19 20 63
17 6 25 20 83
18 2 27 7 90
19 3 30 10 100
Total 30 100

Computing Relative frequency

Frequency: number of times that something occurs.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 57
Relative frequency = frequency divide by sum of all frequencies

Frequency
Relative frequency = ----------------
Sum of all frequencies

Cumulative frequency: frequencies are added up.

•For example 1 /30*100= 3% and 7/30*100 =23%
•Cumulative relative frequency: sums of all relative
frequencies below and including each category

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 58

Steps in constructing the frequency distribution
table for quantitative data:-

1. Data are first divided into a number of intervals.

2. Then the number of data points falling within
each interval is presented as the frequency or
count for that interval.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 59

3. Tally the data in the tally column and obtain the
class frequencies.
Smoothing class intervals to obtain  = (class boundaries)

(Upper limit of first class - lower limit of second class)

 = ----------------------------------------------------

• Subtract  from the first class limits to get the lower

class boundaries
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 60
• Add  to the upper class limits to get the upper class
boundaries
Sturge’s rule: K = 1+3.322(log n)
R
C = ---
K
Where K = number of class intervals n = number of observations
C = class width
R (range) = minimum value – maximum value.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 61

The beginning and end of each interval are called boundaries or
class interval and the point midway between any two boundaries
is called the class mark or midpoint.
For example: table: Body Mass Index Data for a Sample of 120 U.S. Adults: Ordered Array

18.3 21.9 23.0 24.3 25.4 26.6 27.5 28.8 30.9 34.4
19.2 21.9 23.1 24.3 25.6 26.9 27.5 28.8 30.9 34.9
19.8 21.9 23.1 24.5 25.7 27.1 27.6 28.9 31.0 35.0
20.2 22.3 23.3 24.6 25.7 27.3 28.2 29.3 31.1 35.5
20.7 22.3 23.4 24.6 25.8 27.3 28.3 29.5 31.3 35.8
20.8 22.3 23.5 24.7 25.8 27.3 28.3 29.8 31.6 35.9
21.1 22.4 24.0 24.7 25.9 27.3 28.3 30.0 31.6 36.6
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 62
21.1 22.5 24.0 24.8 25.9 27.4 28.4 30.1 32.6 37.1
21.1 22.7 24.0 24.8 26.2 27.4 28.6 30.2 32.8 37.5
21.3 22.7 24.1 25.0 26.5 27.4 28.7 30.3 33.2 37.8
21.3 22.8 24.1 25.4 26.5 27.4 28.7 30.8 33.6 38.2
21.5 22.9 24.2 25.4 26.5 27.4 28.8 30.8 34.2 38.8

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 63

Usually, for a data set of 100 to 150 observations, the
number chosen ranges from about 5 to 10.
In our example, the range of the data is 38.8 –
18.3 = 20.5. Suppose we divide the data set into
seven intervals. Then, we have 20.5 ÷ 7 = 2.93,
which rounds to 3.0. So the intervals have a width
of 3.
These seven intervals are as follows:
o 18.0 – 20.9 o 21.0 – 23.9 o
24.0 – 26.9 o 27.0 – 29.9 o
30.0 – 32.9 o 33.0 – 35.9
2/26/2018o 36.0 – 38.9

By Dr. HAMZE ALI ABDILLAHI 64

Frequency Distribution table
Class Interval for BMI levels Frequency (f) Cumulativ Relative Cumulative
e
Frequency Relative
Frequency
(%) Frequency (%)
(cf )

18.0 – 20.9 6 6 5.00 5.00

21.0 – 23.9 24 30 20.00 25.00
24.0 – 26.9 32 62 26.67 51.67
27.0 – 29.9 28 90 23.33 75
30.0 – 32.9 15 105 12.50 87.50
33.0 – 35.9 9 114 7.50 95.00
36.0 – 38.9 6 120 5.00 100.00
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 65
Total 120 100.00 100.00
Graphs for displaying quantitative data include:

o Histogram o Frequency
Polygon and Ogive o Stem-
and-leaf plot
o Box and Whisker plot ( used
when we are

constructing quartiles) o Scatter plot ( used in

correlation and regression analysis
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 66
Histogram & frequency polygons:

Frequency distributions are often displayed with

a histogram, which looks like a bar chart but
there is no space between bars. The heights of
the bars represent either the number or percent of
observations within each interval.

Frequency polygons, which are essentially a

line that connects the middle of each of the bars
of the histogram, are also used extensively.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 67
To construct a histogram
• Draw the interval boundaries on a horizontal line and
the frequencies on a vertical line.

• Non-overlapping intervals that cover all of the data

values must be used.

• Bars are then drawn over the intervals in such a way

that the areas of the bars are all proportional in the same
way to their interval frequencies.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 68
Using the above data we can contract histogram and
polygon using Excel.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 69

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 70
relative frequency for MBI Data
30

relative frequency
20

0
18.0 – 20.9 21.0 – 23.9 24.0 – 26.9 27.0 – 29.9 30.0 – 32.9 33.0 – 35.9 36.0 – 38.9
class interval

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 71

frequency polygon for BMI Data
35

25
frequency
20

0
18.0 – 20.9 21.0 – 23.9 24.0 – 26.9 27.0 – 29.9 30.0 – 32.9 33.0 – 35.9 36.0 – 38.9
class interval

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 72

Comulative frequency polygon (ogive) for MBI Data
140

120

comulative frequency
100

0
18.0 – 20.9 21.0 – 23.9 24.0 – 26.9 27.0 – 29.9 30.0 – 32.9 33.0 – 35.9 36.0 – 38.9
class interval

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 73

relative frequency polygon for MBI Data
30

26.67
25
23.33
realtive frequency
20 20

12.5

7.5

5 5 5

0
18.0 – 20.9 21.0 – 23.9 24.0 – 26.9 27.0 – 29.9 30.0 – 32.9 33.0 – 35.9 36.0 – 38.9
class interval

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 74

Cumulative relative frequency using Ogive
Another way of representing of quantitative data is the
Ogive which is the graphical presentation of the
commutative relative frequency. Sometimes it may
become necessary to know the number of items whose
values are more or less than a certain amount. We can
use Ogive to estimate the cumulative relative frequencies
of other values.

For example 80% of the respondents have a BMI less

By Dr. HAMZE ALI ABDILLAHI 75
than 2/26/2018 30.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 76

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 77
Stem-and-leaf plot
Example 4: HbA1c from diabetic patients (in % )
7.1 8.0 7.2 7.5 6.4
6.8 8.2 9.1 7.8 8.1
Stem Leaf

6 48

7 1258

8 012

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 78

9 1

Advantages of Stem-and-leaf plot:

•Orders the data, so that the maximum and
minimum are evident

•Gaps in the data become evident

•All the data is displayed
•The shape of the data becomes clearer
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 79
Box and Whisker plot

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 80

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 81
Box and Whisker plot
It is another way to display information when the
objective is to illustrate certain locations in the
distribution. A box plot is a good alternative or
complement to a histogram and is usually better for
showing several simultaneous comparisons.

It is useful for the detection of outliers.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 82

It displays median, minimum, maximum first quartile (Q1)
third quartile (Q3) and inter-quartile range (IQR).
1. A box is drawn with the top of the box at the
third quartile and the bottom at the first quartile.

2. The location of the mid-point of the distribution

is indicated with a horizontal line in the box, which
the median or the ( Q 2)

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 83

3. Finally, straight lines, or whiskers, are drawn
from the centre of the top of the box to the largest
observation and from the centre of the bottom of the
box to the smallest observation

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 84

Scatter plot

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 85

To illustrate the relationship between two characteristics
when both are quantitative variables we use bivariate
plots (also called scatter plots or scatter diagrams).

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 86

Scatter plot showing height and weight of newborn babies

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 87

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 88
Summation notation
Summation notation is simply way of saying that

a collection of numbers is to be added.

Generally, some letter is used is to represent

whatever is being measured; the letter X is the

most common choice.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 89
The notation X1 is used to indicate the first
observation.

The next observation is X2, and so on....

Generally, n is typically used to represent the
total number of observations, and the
observations themselves are represented by X1,
X2, . . . ,Xn.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 90

In symbols, adding the numbers X1,X2, . . . ,Xn is denoted by

Where Xi = X1 +X2+· · ·+Xn,

Where  is an upper case Greek sigma. The subscript i is

the index of summation and the 1 and n that appear
respectively below and above the symbol  designate the
range of the summation.
The i is where the X values start and the n is where the values end.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 91
Sometimes, the sum extends over all n
observations, in which case it is customary to omit
the index of summation. That is, simply use the
notation
Xi = X1 +X2+· · ·+Xn.
For example:
1.2, 2.2, 6.4, 3.8, 0.9.
Then the
= 2.2+6.4+3.8 = 12.4

And Xi = 1.2+2.2+6.4+3.8+0.9 = 14.5.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 92
Another common arithmetic operation is squaring
each observed value and summing the results.

This is written as:

X2i = X21+X22+· · ·+X2n

The adding of all the values and squaring them, is written as :

(Xi) 2
For example
By Dr. HAMZE ALI ABDILLAHI
X2i = 1.22 +2.22 +6.42 +3.82 +0.92 = 62.49

( Xi)2 = (1.2+2.2+6.4+3.8+0.9)2 = 14.52 =

2/26/2018

210.25 . 79

Let c be any constant. In some situations it helps to

note that multiplying each value by c and adding the
results is the same as first computing the sum and then
multiplying by c. This is written as:

cXi = cXi
For example
60Xi = 60Xi = 60×14.5 = 870.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 94
Another common operation is to subtract a
constant from each observed value, square each
difference, and add the results. In summation
notation, this is written as:
 (Xi −c)2.
For example:
For example, suppose we want to
subtract 2.9 from each value, square
each of the results, and then sum these
squared differences. So c = 2.9, and
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 95
(Xi −c)2 = (1.2−2.9)2 +(2.2−2.9)2+· · ·+(0.9−2.9)2 = 20.44.

Basic Biostatistics
Measures of central tendency
Measures of central tendency

1. Mean - average (arithmetic mean)

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 96
2. Median - middle value
3. Mode - most frequently observed
value(s).
Means, medians, and modes are
methods of measuring the central
tendency of a group of values- that is,
the tendency for values in a group to
gather around a central or average value
which is typical of the group.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 97
To avoid biased reporting central tendency
must be addressed collectively, based on all
the three measures mean, median, mode.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 98

Formulas for Mean: (arithmetic mean)

Mean
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 99
The mean is the sum of all the values
in a data set, divided by the number of
values. The mean of a whole
population is usually denoted by μ,
(called mu) while the mean of a
sample is usually denoted by
called x-bar).
To calculate the mean:
 Sum up all the values.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 100
 Divide the sum by the
number of values.

Mean is a simple point-estimate for the population

mean, which is just the average of the data
collected. The mean is very sensitive to outliers and
the estimate can be biased in the presence of
extreme values. Unlike the median and mode, where
a change to an extreme value usually has no effect
Mean of the ungrouped data:
Example:
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 101
The results of HbA1c of patients with diabetes is; 4.0,
5.4, 4.6, 6.0.
Calculate the mean of the data?

Result

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 102

(4.0+5.4+ 4.6+6.0)
Mean = -------------------- = 20/4 = 5 4
The mean of the HbA1c is = 5. Remember that
when writing the mean, it is good practice to
refer to the unit of measured; in this case it is an
HbA1c value of 5%.

Example 2
 Data set is 4, 7, 5, 9, 5.
Calculate the mean?
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 103
 Data set is 10, 12, 16,14.
Calculate the mean?
Result
4+7+5+9+5 M
= ---------------- = 6
5

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 104

10+12+16+14 M =
---------------- = 13
4
Mean of the grouped data
In calculating the mean from grouped data, we
assume that all values falling into a particular
class interval are located at the mid-point of the
interval. It is calculated as follow:

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 105

Example: Where

Age fi mi mifi
15-19 11 17 187
20-24 36 22 792
25-29 28 27 756
30-34 13 32 416
35-39 7 37 259
40-44 3 42 126 Mean = 2630/100 = 26.3
45-49 2 47 94
Trimmed mean
Total 100 2630
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 106
It trims all but one or two values.
No specific amount of trimming is always
best, but 20% trimming is often a good
choice in the literature. This means that the
smallest 20%, as well as the largest 20%, are
trimmed and the average of the
remaining data is computed. Although there are
circumstances where this extreme amount of
trimming can be beneficial, but sometimes this
extreme amount of trimming can be detrimental.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 107

Computation of trimmed mean:
• first compute 0.2*n
• Round down to the nearest number.

• call this result g,

The formula of 20% trimmed mean is given by :

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 108

X t = ----------- (X (g+1) +· · ·+ X(n−g ))
n−2g
Example
Data values are:
46,12,33,15,29,19,4,24,11,31,38,69,10

Calculate the trimmed mean?.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 109
Ordered data:
4,10,11,12,15,19,24,29,31,33,38,46,69.
The number of values is n = 13, 0.2(n) = 0.2(13) = 2.6,
•Rounding this down to the nearest integer yields g = 2.
•That is, trim the two smallest values, 4 and 10, trim the two
largest values, 46 and 69

•Average the numbers that remain yielding.

1
M t = ----------- (11+12+15+19+24+29+31+33+38) = 23.56.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 110
9
Median

It is the second measure, is the middle number

of a set of numbers arranged in numerical order.
To calculate the median of the ungrouped data?
• First arrange the values in order of size and then find the
middle value.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 111

• If the number of observations, n, is even, Then location
of the sample median is, m=n/2. Then the median is the
two middle numbers divided by 2. Or we can use the
formula m = (n+1)/2 for both odd an even.

• If the number of observations, n, is odd, Then the

location of the sample median is m = (n+1)/2.
Finding the location of the median
Median = (n+1)/ 2
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 112
Example1
Median of the Ungrouped data
Find the median of (13, 3, 20, 22, and 25)
Ordered data: 3, 13, 20, 22, and 25. The median
= n+1/2 = 5+1/2 = 3 so the location of the median
is third data value which is = 20
Example 2

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 113

If there is an even number of values, use the mean
of the two middle values. For example the values
3, 13, 13, 20, 22, 25: median = n+1/2 = 6+1/2 =
3.5, so the median lies between number 3 and 4.
Median = (13 + 20)/2 = 16.5. It is the point that
divides a distribution of scores into two equal
halves

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 114

Median of the Grouped data

1. Lm= lower true class boundary of the interval

containing the median.
2. Fc = cumulative frequency of the interval just above
the median class interval.
3. Fm = frequency of the interval containing the median
4. W= class interval width.
By Dr. HAMZE ALI ABDILLAHI 115
5. n = total number of observations
2/26/2018

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 116

Example:

Age fi Cum. F
5-14 5 5
15-24 10 15
25-34 20 35
35-44 22 57
45-54 13 70
55-64 5 75
2/26/2018
By Dr. HAMZE ALI ABDILLAHI 117
The mean versus the median
 The mean is sensitive to outliers
 The median is not sensitive to outliers
 When the data are highly skewed, the
median is usually preferred

 When the data are not skewed, the median

and the mean will be very close
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 118
Mode
The last measure is the mode, which is the most frequent
occurring number.

Example: 3, 13, 13, 20, 22, 25: the mode = 13. It is

usually more informative to quote the mode
accompanied by the percentage of times it happened;
e.g, the mode is 13 with 33% of the occurrences. In
medical research, mean and median are usually

By Dr. HAMZE ALI ABDILLAHI 119

presented. A set can have more than one mode; if it has
two, it is said to be bimodal.
2/26/2018

Example
Data values:
Ordered data : 1,1,3,3,4,5, 60
The mean is : 77/7 = 11
( n+1) 7+1
Median is = ------ ---- = 4 ( location )

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 120

2 2
So the median is the fourth data value , m = 3
Mode = most frequent number in the data set
Which is = 1 & 3 , so the mode is bimodal

By Dr. HAMZE ALI ABDILLAHI 121

Mode of the grouped data

Lo = the lower boundary of the modal class

D1 = difference in frequency between modal class and the one before
D2 = difference in frequency between modal class and the one after
Co = the width of the modal class

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 122

Note , the modal class is the one that contains the highest frequency
Example
class mi fi fc
(midpoint)
9.5 – 13.5 11.5 3 3
13.5 – 17.5 15.5 4 7
17.5 – 21.5 19.5 8 15
21.5 – 25.5 23.5 3 18
25.5 – 29.5 27.5 2 20
Sum 20
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 123
Calculate :
Mode , mean and median of the data.
Mode, the third class has the largest frequency = 8
So the class (17.5-21.5) is the modal class.

For the modal class , Lo = 17.5, D1 = (8-4) = 4

D2 = (8-3) 5 and Co = (21.5 -17.5) = 4

So the mode = 17.5 + (4/4+5)

Calculate the: mean and median

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 124
Result
 Mean = 378/20 = 18.9
 Median = 19

Measures of dispersion
1. Range
2. Variation (SS) the sum of squared
deviation from the mean.
3. Variance (S2)
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 125
4. Standard deviation ( S )
5. Standard error ( SE )
6. Quartiles and inter quartile range
( QR )
7. Coefficient of variation ( CV )

Range
Is the difference between the maximum and the
minimum data values.
R = XL- XS, where XL = is the largest value and
XS = is the smallest value.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 126
It is the simplest measure and can be easily
understood. It takes into account only two values
which causes it to be a poor measureof
dispersion. One application is in quality control
charts, especially when small sample sizes are
involved.
For example:
data set: 4, 5, 6 , 7, 14

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 127

The maximum value is 14 and
minimum value is 4 So, the range
is 14-4 = 10
Variation (SS) the sum of squared deviation from the
mean
Variation ( SS )

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 128

Variation is used in the construction of
analysis of variance (ANOVA) tables
which will be discussed later.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 129

Variance (S2)

The variance is the average of the squares of the

deviations taken from the mean.
Variance is = Variation divided by (n-1).

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 130

Variance is used to account for the sample size
used.

A small data set, that has a bigger dispersion

(the points are too far from each other)
compared with a large data set, may show a
smaller computed variation
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 131
This is due to the fact that only a small
number of values are used in the small
data set compared to a large one.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 132

Note:
that the variation is divided by (n-1) instead
of n. when the variation is divided by n, the
formula is said to be biased because it
underreports the dispersion especially in
small data set.

By Dr. HAMZE ALI ABDILLAHI 133

But when using a large data set it does not matter
to use n as a denominator.
2/26/2018

To calculate the variance:

1. Calculate the mean of the distribution
2. Find the difference between each score and the
mean:
3. Square each of these results
4. Sum these squared deviations ( differences )

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 134

5. Add up the number of observed values, and
subtract 1. This is called the variance. (This is the
average squared deviation from the mean).

Standard deviation ( S )
It is the square root of variance. In variation,
the unit of measurement is in the squared
form. And when divided by (n-1) into
variance the unit is still in squared form.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 135
To bring back to the original unit of measurement,
the square root of the variance of the variance
must be obtained
The standard deviation (SD) quantifies
variability or scatter. Standard deviation
is a measure of precision of the population
distribution.

Tells us what we could expect about

individuals in the population
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 136
The standard deviation computed this way
(with a denominator of N-1) is called the
sample sd, in contrast to the population sd,
which would have a denominator of N. (N1)
known as degrees of freedom. Sd is
always reported alongside the mean value.
For example, the mean cholesterol is 5.2 ±
0.6 mmol/l.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 137
 Sd parameter used in establishing data
symmetry and normality that will be
discussed later.

 Sd also used in quality control charts to

monitor the process variation from time to
time.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 138

Steps in calculating SD
1. Find the mean .
2. Subtract this from every value in the group individually
- this shows the deviation from the mean, for every
value.
3. Work out the square (x2) of every deviation (that is,
multiply each deviation by itself ( e.g. 5*5) - this
produces a squared deviation for every value.
4. Add up all of the squared deviations.
5. Add up the number of observed values, and subtract 1.
6. Divide the sum of squared deviations by this number,
to produce the sample variance.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 139
7. Work out the square root of the variance.
2/26/2018

Standard error of the mean ( SEM )

SE quantifies the precision of the mean. It is a

measure of precision of a sample statistic. Tells
us how precise our estimate of the parameter
is. It is a measure of how far your sample mean
is likely to be from the true population mean.
By Dr. HAMZE ALI ABDILLAHI 140
Standard error
(SE)
=
To calculate SE, sd divided by the
square root of n, the sample size.

It is an indication of sample to
sample variation.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 141
2/26/2018

By Dr. HAMZE ALI ABDILLAHI 142

For example, if we took a large number of
samples of a particular size from a
population and recorded the mean for each
sample, we could calculate the sd of all their
means- this is called SE. because it is based
on a very large number of theoretical

By Dr. HAMZE ALI ABDILLAHI 143

samples, it should be more precise and
therefore 2/26/2018 smaller than sd.
It is used in hypothesis testing and the
calculation ofconfidence intervals.
The difference between the SD and
SEM

By Dr. HAMZE ALI ABDILLAHI 144

Students confuse about the difference
between the standard deviation ( SD )
and the standard error of the mean
(SEM
2/26/2018

a) The SD quantifies scatter — how

much the values vary from one
another.
b) The SEM quantifies how accurately
the true mean of the population. The
SEM gets smaller as your samples get
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 145
larger. Because the mean of a large
sample is likely to be closer to the true
population mean than is the mean of a
small sample.

By Dr. HAMZE ALI ABDILLAHI 146

Example
Data set = 4, 7, 5, 9, 5.
Calculate :
a) Mean
b) Maximum & minimum
c) Range
d) Variation
e) Variance
f) Standard deviation
g) Standard error
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 147
Result
Mean = 30/5 = 6
Maximum = 9, minimum = 4
Range = 9 – 4 = 5

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 148

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 149
Problem
Data set
10 , 12, 16, 14
Calculate:
a) Mean
b) Maximum & minimum
c) Range
d) Variation
e) Variance
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 150
f) Standard deviation
g) Standard error of the mean
Result
a) Mean = 13
b) Maximum = 16
c) Minimum = 10
d) Range = 16 – 10 = 6
e) Variation, SS = 20
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 151
2
f) Variance , S = 6.67
g) Standard deviation = 2.58
Measures of dispersion 2

Quartiles & inter-quartile range

Coefficient variation

Detecting outliers
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 152
Quartiles
Values which divide the sorted data set into
four equal parts, so that each part represents
25% of the data.Quartiles are divided by the
25th percentile, 50th percentile, and 75th
percentile. One quarter of the values are less

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 153

th
than or equal to the 25 percentile. The
median is the 50 th percentile.
Quartiles

 Q1 = gives the cut-point for the lower 25

% of the data set.

 Q2 = is the median.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 154
 Q3 = gives the cut-point for the upper
25 % of the data set
Used of Quartiles
1. Qs and IQR are used in the construction of the box plot.

2. This box plot can be used to detect outliers in data set.

3. An outlier is said to be a number more than 1.5 IQRs

below Q1 or above Q3.

4. Qs are reported with median

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 155
Finding the location of Quartiles

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 156

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 157
Example:
Data set, 10, 12, 16, and 14.
Calculate the:
o Mean o
Median
o Standard deviation o Quartiles

o CV %
Mean = 13, median = 13, Sd = 2.58
Ordered data = 10, 12, 14, and 16.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 158
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 159
Coefficient of variation (CV) o
Also known as relative variability.
o It is the measure of normalised dispersion.
o It is the ratio between measure of spread
and measure of location.

o It is expressed in percentage form.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 160

Coefficient of variation (CV) o A small value
implies that the spread is small with respect to the
location and there is high level of precision.

o It is often used for the evaluation of

instrument reliability.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 161
o Because it is a unit-less ratio, you can
compare the CV of variables expressed in
different units.

Example
Data set, 10, 12, 16, and 14.
Calculate the:
Coefficient of variation
Mean = 13, Sd = 2.58
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 162
Detecting outliers
 Outliers are values that are unusually
large or small.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 163

 A single outlier can grossly affect the
sample mean and variance.

 The detection of outliers is important

for a variety of reasons.

 Detecting an outlier can help recognize

erroneously recorded results.
A simple approach to detecting outliers is to simply
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 164
1. Look at the data. Checking data entry.

2. A classic outlier detection method

3. Inspect graphs of the data (box plot)
A classic outlier detection method
• A classic outlier detection technique
illustrates the problem of masking.
• This classic technique declares the value X an
outlier if

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 165

For example

Data values are:

2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,1,000.
The sample mean is X = 65.94 the sample standard
deviation is S = 249.1.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 166
|1000 - 65.94| ---------
= 3.75.
249.1
Since 3.75 is greater than 2, so the value 1,000 is
declared an outlier
Another Example
Data values are:
2,2,3,3,3,4,4,4,100,000,100,000.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 167

The sample mean is = 20,002.5, the sample
standard deviation is s = 42,162.38,

|100,000−20,002.5|
---------------------- = 1.897
42,162.38
The box plot rule
Box plot is another rule of outlier detection.
It is based on the fundamental strategy of
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 168
avoiding masking by replacing the mean and
standard deviation with measures of
location and dispersion that are relatively
insensitive to outliers.
This rule is based on the lower and upper
quartiles, as well as the inter-quartile range,
which provide resistance to outliers.
The box plot rule declares the value X an
outlier if
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 169
X < q1 −1.5 (q2 −q1) Or
X > q2 +1.5(q2 −q1)
For example:
Data values are:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,100,500.
The lower quartile is q1 = 4.417, the upper quartile is q2 =
12.583.
so q2 +1.5(q2 −q1) = 12.583+1.5(12.583−4.417) = 24.83.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 170

That is, any value greater than 24.83 is declared an outlier.
Hence, the values 100 and 500 are labeled outliers.
Types of Data

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 171

Data

Categorical Numerical
(Qualitative) (Quantitative)

Discrete Continuous

Types of Sampling Methods

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 172

Population

Samples

Non-Probability Probability Samples

Samples
Simple
Random Stratified
Convenience random
Judgment sampling
sampling Sampling
Systematic Cluster
Quota random sampling
Snowballing
sampling sampling
sampling
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 173
Probability: means the chance of an
occurrence. To compute the chance of
occurrence, we need to know all the items in
the population.

Sampling frame refers to complete list of all

the items in the population.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 174

Random means that every item in the
population has an equal chance of being
picked.
Why sampling?
Investigation entire population by a census

 is costly

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 175

 Time consuming
Requires large
manpower
Sampling is a more cost-effective and convenient

means of collecting information.

Simple Random Sampling

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 176

• Every individual or item from the frame has an
equal chance of being selected Samples
obtained from:

table of random numbers or

computer random number generators.
Advantages of SR

minimal knowledge of population

needed

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 177

statistical estimation of error
Easy to analyze data
Disadvantages

High cost; low frequency of use

Requires sampling frame
Does not use researchers’ expertise
Larger risk of random error than
stratified

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 178

Table of random numbers
6 8 4 20, 5 7 9 57, 4 1 82 5, 6 3 29 1,
5 8 2 10, 3 62 1 5, 4 07 8 5, 9 6 02 0,
3 6 25 3, 3 34 2 5, 4 77 8 9, 1 22 0 3,
9 8 56 2, 6 31 0 1, 7 84 2 4, 5 05 3 6
 Locate one row and one column in the table.
 Close the eyes and use pencil to choose any number.
 Say the number is 5821.
 Read the digits horizontally, can also be read vertically down.
Split the digits into two-digit numbers : example 58, 21, 03
…
 Remove the repeat numbers and rearrange the selected
numbers
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 179
Fore example in a class of 40 students, each students has a 1/40
(0.025) chance of being picked.

Systematic random sampling

• Decide on sample size: n
• Divide frame of N individuals into groups of k individuals:
k=N/n
• Randomly select one individual from the 1st group
• Select every k-th individual thereafter.

• First number that is within the range 1 – 8 is 3

• Then the next number is 3+8 = 11 and third is 11 + 8 =

19 and so on…..
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 180
N=
64 n =
8k=
8
First Group

Advantages: Systematic Sampling

Moderate cost; moderate usage
statistical estimation of error
Simple to draw sample; easy to verify
Disadvantages

Requires sampling frame

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 181
Potential for bias if there are
underlying patterns to the sampling
frame
Stratified Samples
• Population divided into two or more
groups accordingto some common
characteristic with similar groups in each
strata.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 182

• Simple random sample selected from each
group
• The two or more samples are combined
into one.
 Advantages
minimal knowledge of population needed
Allows calculation statistical estimation of
error
Easy to analyze data

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 183

 Disadvantages
 High cost
 Requires sampling frame
 Does not use researchers’ expertise
 Larger risk of random error than stratified
 Unhelpful if there are no homogenous
groups

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 184

For example:
we have 16 boys and 24 girls in a class, and we wand to
stratify the class by gender.
•First divide class list into two (boys and girls lists).
•We want select 5 from the sampling frame.
•Subjects from each stratum is usually proportionate to
the population size within each stratum.
n = 5/40 *100 = 12.5% . The number of boys will be
16*12.5/100 = 2, we select two boys from sampling
frame using simple random sampling.
By Dr. HAMZE ALI ABDILLAHI 185
The number of girls = 24 *12.5/100 = 3 we select 3 girls
from the samplingframeusing simple random
sampling .
2/26/2018

Cluster Samples
• Population divided into several “clusters,”
each representative of the population
• Simple random sample selected
from each Population
divided
into 4
clusters. 186
• The samples are combined into one

Chap1-165

Cluster sampling is useful when it

is difficult or costly to develop a
complete list of the population
2/26/2018 By Dr. HAMZE ALI ABDILLAHI
members or when the population
elements are widely dispersed
geographically.

Cluster sampling may increase

sampling error due to similarities
among cluster members
2/26/2018 .
By Dr. HAMZE ALI ABDILLAHI

188
Advantages

Low cost
Requires list of all clusters
Can estimate characteristics of both
cluster and population
Disadvantages

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 189

Increase sampling error
Stratification vs. Clustering
Stratification • Divide • More expensive to
population into groups obtain stratification
different from each other: information before
sexes, races, ages • Sample sampling
randomly from each group Clustering
• Less error compared to
simple random

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 190

• Divide population into • More error compared to
comparable groups: simple random • Reduces
schools, cities costs to sample only some
• Randomly sample some areas or organizations
of the groups

Non-probability Samples

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 191

We use when the sampling frame is
absent .
1. Convenience sampling
2. Quota sampling
3. Judgment sampling

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 192

4. Snowballing sampling
Convenience Sample
 Subjects are selected on basis
of being readily available.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 193

 Target population is defined
and the required sample size is
determined.
 Subjects are selected until we
reach the required sample size.
Advantages

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 194

Very low cost
Extensively used/understood
No need for list of population
elements

Disadvantages

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 195

Variabilityand bias cannot be
measured or controlled- volunteer
bias

Quota Sampling
1. Select demographic characteristics of interest
(e.g. age, sex, ethnicity).

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 196

2. After selecting the target population into
homogenous groups , the number of subjects
in each group will not be the same.
3. So we find the percentage composition of
each group in the population, similar to the
first stage of stratified sampling method.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 197

4. Then we choose the subjects using
convenient procedure , on first-come-first
serve basis
 Advantages

moderate cost
 Very extensively used/understood
 No need for list of population elements

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 198

 Introduces some elements of
stratification
 Representative with regard to known
characteristics
 Disadvantages
 Variability and bias cannot be measured
or controlled –volunteer bias

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 199

For example
In a study on perception of outpatients on services
provided at a hospital, the patients may be
subdivided into various age groups .
Target population is (patients between 21 to 60
years old seeking services at the particular hospital.
Age groups are (21,30) (31,40) (41,50) (51, 60) . The
percentage of the patients taken from hospital
records were 10%, 30%, 40%, 20% respectively. If
the overall sample size is 50 , then the 50*10/100 =
By Dr. HAMZE ALI ABDILLAHI 200
5 patients will be choosing from the first group
interval (21,30) …also 15, 20 and 10 from other
groups respectively.
2/26/2018

Judgment sampling
 Subjects chosen purposively on the basis of
having particular features
 Used by specialists or authorities in a
specific area.
 Most case studies are done in this manner.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 201
 Sample size may not be large but an indepth
study of the cases is the main focus.
 Also used when choosing controls for
epidemiological studies.
 Useful for rare characteristics
 Advantages

Moderate cost
 Commonly used/understood
 Sample will meet a specific objective

By Dr. HAMZE ALI ABDILLAHI 202

 Useful for qualitative research
 Useful for rare characteristics

 Disadvantages Bias

2/26/2018

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 203

Snowballing sampling
 Researchers move from one known
case to another just by referrals.
 Usedin rare events( sentinel events)
.
 Enables researcher to reach groups
that are otherwise hard to reach.
By Dr. HAMZE ALI ABDILLAHI 204
For example; when studying rare behaviors
in the population such as drug abuse
2/26/2018

Advantages

low cost
Useful in specific circumstances
Useful for locating rare
populations

By Dr. HAMZE ALI ABDILLAHI 205

Disadvantages

Bias because sampling units not

independent
2/26/2018
View publication stats

By Dr. HAMZE ALI ABDILLAHI 206

Midterm Exam in Statistics and Probability (Grade 11)
95% (59)
Midterm Exam in Statistics and Probability (Grade 11)
2 pages
SPSS 21 Step by Step Answers To Selected Exercises
100% (3)
SPSS 21 Step by Step Answers To Selected Exercises
76 pages
Biostatistics and Exercise
100% (8)
Biostatistics and Exercise
97 pages
Lecture Notes On Biostatistics.: February 2020
No ratings yet
Lecture Notes On Biostatistics.: February 2020
179 pages
Topic1 ST
No ratings yet
Topic1 ST
36 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
20 - Basic Concepts and Terminology in Biostatistics (SepI2020)
No ratings yet
20 - Basic Concepts and Terminology in Biostatistics (SepI2020)
38 pages
Chapter-1 (Introduction To Biostatistics)
No ratings yet
Chapter-1 (Introduction To Biostatistics)
30 pages
Data Analysis
No ratings yet
Data Analysis
84 pages
1 Introduction
No ratings yet
1 Introduction
97 pages
Measurement Scales
No ratings yet
Measurement Scales
18 pages
BIO STATISTICS of First Semester
No ratings yet
BIO STATISTICS of First Semester
143 pages
Biostat Intro
No ratings yet
Biostat Intro
60 pages
1 - Introduction To Statistics
No ratings yet
1 - Introduction To Statistics
34 pages
Biostat Introduction
No ratings yet
Biostat Introduction
31 pages
Lecture 1_Online_INTRODUCTION TO BIOSTATISTICS [Compatibility Mode]
No ratings yet
Lecture 1_Online_INTRODUCTION TO BIOSTATISTICS [Compatibility Mode]
28 pages
Biostatistics Introduction
100% (1)
Biostatistics Introduction
39 pages
t1 Introduction To Biostatistics
No ratings yet
t1 Introduction To Biostatistics
57 pages
1. Nature of Biostat
No ratings yet
1. Nature of Biostat
54 pages
Biostat 1st Part
No ratings yet
Biostat 1st Part
195 pages
Module One
No ratings yet
Module One
44 pages
01-Introduction 55
No ratings yet
01-Introduction 55
41 pages
Lecture 1
No ratings yet
Lecture 1
63 pages
1 - Introduction To Biostatistics
No ratings yet
1 - Introduction To Biostatistics
321 pages
Biostatistics Introduction and Variables
No ratings yet
Biostatistics Introduction and Variables
3 pages
18- Introduction and levels of measurements(2017-18)
No ratings yet
18- Introduction and levels of measurements(2017-18)
41 pages
Attachment
No ratings yet
Attachment
307 pages
Descriptive Statistics For Medical Studnets 2016 - 2
100% (1)
Descriptive Statistics For Medical Studnets 2016 - 2
156 pages
Biostatistics Introduction
No ratings yet
Biostatistics Introduction
52 pages
Biostatistics Lecture - 1 - Introduction
50% (4)
Biostatistics Lecture - 1 - Introduction
36 pages
1. introduction to biostatistics[[[[[[[
No ratings yet
1. introduction to biostatistics[[[[[[[
30 pages
Lecture-1- Ch-1 -Basic concept
No ratings yet
Lecture-1- Ch-1 -Basic concept
39 pages
Biostatistics Chapter 1
No ratings yet
Biostatistics Chapter 1
32 pages
Biostatistics
No ratings yet
Biostatistics
234 pages
Cardiovascular System Course Biostatistics Lesson 1: Introduction To Biostatistics: Types and Collection of Data
No ratings yet
Cardiovascular System Course Biostatistics Lesson 1: Introduction To Biostatistics: Types and Collection of Data
21 pages
1. Introduction to biostatistics_١٠٠٩٣٥
No ratings yet
1. Introduction to biostatistics_١٠٠٩٣٥
30 pages
Biostatistics CN
No ratings yet
Biostatistics CN
79 pages
Statistical Biology Module
No ratings yet
Statistical Biology Module
74 pages
Biostatistics_1st_Semester (1)
No ratings yet
Biostatistics_1st_Semester (1)
61 pages
Biostatistics and Experimental Design: Yadgar Ali Mahmood University of Garmian
No ratings yet
Biostatistics and Experimental Design: Yadgar Ali Mahmood University of Garmian
79 pages
Chapter 102 Biostatistics
No ratings yet
Chapter 102 Biostatistics
44 pages
Itrodution to Biostatistics
No ratings yet
Itrodution to Biostatistics
130 pages
Data Sourses and Types
No ratings yet
Data Sourses and Types
13 pages
Biostatistics Nurses Hnd
No ratings yet
Biostatistics Nurses Hnd
125 pages
STT034 Lecture
No ratings yet
STT034 Lecture
6 pages
Basics of Biostatistics ALL
No ratings yet
Basics of Biostatistics ALL
456 pages
Introduction To Biostatistics: Dr. M. H. Rahbar
No ratings yet
Introduction To Biostatistics: Dr. M. H. Rahbar
35 pages
Different Types of Variable Used in Data Collection
No ratings yet
Different Types of Variable Used in Data Collection
26 pages
Biostatistics
No ratings yet
Biostatistics
40 pages
Lecture No. 12 Community Dentistry
No ratings yet
Lecture No. 12 Community Dentistry
21 pages
chapter 1.introduction to biostat
No ratings yet
chapter 1.introduction to biostat
48 pages
Contact Details:: Dr. Joy C. Chavez
No ratings yet
Contact Details:: Dr. Joy C. Chavez
101 pages
Introductiontobasicsofbio Statistics 180127163400
No ratings yet
Introductiontobasicsofbio Statistics 180127163400
48 pages
Biostatistics 1
No ratings yet
Biostatistics 1
38 pages
Biostatistics
No ratings yet
Biostatistics
78 pages
Topic 1 - W1-3 Introduction To Biostatistics
No ratings yet
Topic 1 - W1-3 Introduction To Biostatistics
52 pages
1 Introduction To Biostatistics
100% (2)
1 Introduction To Biostatistics
52 pages
Chapter 1 Biostat Discript Stastics
No ratings yet
Chapter 1 Biostat Discript Stastics
118 pages
Introduction To Bio Statistics
No ratings yet
Introduction To Bio Statistics
37 pages
Chapter One
No ratings yet
Chapter One
146 pages
Important Concepts Doc
No ratings yet
Important Concepts Doc
40 pages
Public Health Epidemiology
From Everand
Public Health Epidemiology
Crystel Harb
5/5 (1)
Box and Whisker Plot
No ratings yet
Box and Whisker Plot
33 pages
Introduction to Robust Estimation and Hypothesis Testing Second Edition Rand R. Wilcox instant download
No ratings yet
Introduction to Robust Estimation and Hypothesis Testing Second Edition Rand R. Wilcox instant download
84 pages
BUS511.3: Business Statistics: Mohammad Arman, Ph.D. North South University
No ratings yet
BUS511.3: Business Statistics: Mohammad Arman, Ph.D. North South University
51 pages
Sampling Methods and The Central Limit Theorem
No ratings yet
Sampling Methods and The Central Limit Theorem
24 pages
BOYS - HYPERTENSION
No ratings yet
BOYS - HYPERTENSION
5 pages
Statistics For Business Decisions DSC 1
No ratings yet
Statistics For Business Decisions DSC 1
3 pages
Statistika S1
No ratings yet
Statistika S1
11 pages
MODULE 9 - Practical Research 1 (STEM) : Most Frequently Used Data Collection Techniques
No ratings yet
MODULE 9 - Practical Research 1 (STEM) : Most Frequently Used Data Collection Techniques
8 pages
PSUnit II Lesson 5 Locating Percentiles Under The Normal Curve
100% (2)
PSUnit II Lesson 5 Locating Percentiles Under The Normal Curve
13 pages
Statistics+DPP+-+11th+Elite by Arvind Kalia Sir
No ratings yet
Statistics+DPP+-+11th+Elite by Arvind Kalia Sir
80 pages
Statistics Paper 1: Answer: (A) ..
No ratings yet
Statistics Paper 1: Answer: (A) ..
7 pages
_211423205137 Ex-8A
No ratings yet
_211423205137 Ex-8A
3 pages
Diagnosis Worksheet: Page 1 of 2 Citation: Are The Results of This Diagnostic Study Valid?
No ratings yet
Diagnosis Worksheet: Page 1 of 2 Citation: Are The Results of This Diagnostic Study Valid?
2 pages
Cynthia’s Final Year Project
No ratings yet
Cynthia’s Final Year Project
83 pages
DBCA
No ratings yet
DBCA
11 pages
IBM SPSS Statistics Brief Guide
No ratings yet
IBM SPSS Statistics Brief Guide
60 pages
Formula PDF
No ratings yet
Formula PDF
7 pages
Elementary Statistics 9th Edition Weiss Test Bank download
100% (3)
Elementary Statistics 9th Edition Weiss Test Bank download
46 pages
Animal Genetics & Breeding Unit - I: Standard Error & Coefficient of Variation %
No ratings yet
Animal Genetics & Breeding Unit - I: Standard Error & Coefficient of Variation %
12 pages
Edexcel S3 Notes PDF
No ratings yet
Edexcel S3 Notes PDF
43 pages
Python and Finance DATACAMP Chapter 2
No ratings yet
Python and Finance DATACAMP Chapter 2
38 pages
STA 203 Formula Sheet Exam II FA23
No ratings yet
STA 203 Formula Sheet Exam II FA23
1 page
Interpretation Descriptive Statistics
No ratings yet
Interpretation Descriptive Statistics
2 pages
Activity - Problems Involving Areas Under The Normal Curve
No ratings yet
Activity - Problems Involving Areas Under The Normal Curve
1 page
Factors Influencing The Service Lifespan of Buildings An Improved Hedonic Model
No ratings yet
Factors Influencing The Service Lifespan of Buildings An Improved Hedonic Model
9 pages
Index Numbers
100% (2)
Index Numbers
5 pages
Probability and Statistics Course Outline
67% (3)
Probability and Statistics Course Outline
2 pages
Age of Respondent
No ratings yet
Age of Respondent
63 pages

Bio Statistics For Medical Students

Uploaded by

Bio Statistics For Medical Students

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Lecture notes on Biostatistics.

Hamze ALI Abdillahi Medical

The user has requested enhancement of the downloaded file.

GOLLIS UNIVERSITY -ERIGAVO

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 1

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 2

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 4

•Hospital utility statistics

Statistics can be used to:

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 7

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 8

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 9

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 11

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 13

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 14

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 15

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 17

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 18

By Dr. HAMZE ALI ABDILLAHI 19

2. Ordinal scale (categories can

= mild, moderate, severe

= upper, middle, lower

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 25

Examples: number ofsurgery

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 30

2.Secondary data: which had been

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 32

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 33

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 35

These two charts are used for presentation of qualitative

By Dr. HAMZE ALI ABDILLAHI 36

The complete circle represents the total

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 37

 Construct a frequency table

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 40

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 43

• Gender and Boy 5 10 15

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 46

Wearing spectacles Total

Girls 40% 60% 100%

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 48

Table showing the percentage of Gender and

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 49

Expressed in percentage. 33.33%

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 51

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 52

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 53

Frequency distribution: is a table showing a

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 55

The relative frequency is just the frequency

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 56

Computing Relative frequency

Cumulative frequency: frequencies are added up.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 58

1. Data are first divided into a number of intervals.

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 59

(Upper limit of first class - lower limit of second class)

• Subtract  from the first class limits to get the lower

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 61

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 63

By Dr. HAMZE ALI ABDILLAHI 64

18.0 – 20.9 6 6 5.00 5.00

constructing quartiles) o Scatter plot ( used in

Frequency distributions are often displayed with

Frequency polygons, which are essentially a

• Non-overlapping intervals that cover all of the data

• Bars are then drawn over the intervals in such a way

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 69

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 71

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 72

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 73

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 74

For example 80% of the respondents have a BMI less

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 76

2/26/2018 By Dr. HAMZE ALI ABDILLAHI 78