Bio Statistics For Medical Students
Bio Statistics For Medical Students
net/publication/339499419
CITATIONS READS
0 91,539
1 author :
22 PUBLICATIONS 1 CITATION
SEE PROFILE
All content following this page was uploaded by Hamze ALI Abdillahi on 26 February 2020.
1. Draw conclusions
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 6
2. Make predictions about
what will happen in other
subjects
Examples
1) At Hargeisa general hospital, 5% of
the patients were diagnosed with
DM last year
•
• Aim is to be able to Interested in the
make some general particular subjects
statements about a wider set of subjects
that have been studied
1. Planning
2. Design
3. Data collection
4. Data Processing
5. Data Presentation
6. Data Analysis
By Dr. HAMZE ALI ABDILLAHI 10
7. Interpretation
2/26/2018 8. Publication
Population & Sample
• Population: is a complete set of items or
subjects which can be studied
Target population: A collection of items that
have something in common for which we
wish to draw conclusions at a particular time.
is a two-stage procedure: we
want to generalize conclusions
from the sample to the study
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 12
population and then from the
study population to the target
population.
example
In a study of the prevalence of Kat chewing
among secondary students in Somalia a
random sample of Secondary students in
Hargeisa were taken.
Study
population
Target
population
1. Pie charts
2. Bar charts (simple and clustered bar charts)
3. Relative frequency (percentage) table
Frequency
Relative frequency = ----------------
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 39
Sum of all frequencies
yes No
Boy 5 10 15
Girls 10 15 25
47
Total 15 25 40
2/26/2018 By Dr. HAMZE ALI ABDILLAHI
YES NO
YES 70 100
NO 3 70
Frequency
Relative frequency = ----------------
Sum of all frequencies
= ----------------------------------------------------
18.3 21.9 23.0 24.3 25.4 26.6 27.5 28.8 30.9 34.4
19.2 21.9 23.1 24.3 25.6 26.9 27.5 28.8 30.9 34.9
19.8 21.9 23.1 24.5 25.7 27.1 27.6 28.9 31.0 35.0
20.2 22.3 23.3 24.6 25.7 27.3 28.2 29.3 31.1 35.5
20.7 22.3 23.4 24.6 25.8 27.3 28.3 29.5 31.3 35.8
20.8 22.3 23.5 24.7 25.8 27.3 28.3 29.8 31.6 35.9
21.1 22.4 24.0 24.7 25.9 27.3 28.3 30.0 31.6 36.6
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 62
21.1 22.5 24.0 24.8 25.9 27.4 28.4 30.1 32.6 37.1
21.1 22.7 24.0 24.8 26.2 27.4 28.6 30.2 32.8 37.5
21.3 22.7 24.1 25.0 26.5 27.4 28.7 30.3 33.2 37.8
21.3 22.8 24.1 25.4 26.5 27.4 28.7 30.8 33.6 38.2
21.5 22.9 24.2 25.4 26.5 27.4 28.8 30.8 34.2 38.8
o Histogram o Frequency
Polygon and Ogive o Stem-
and-leaf plot
o Box and Whisker plot ( used
when we are
25
relative frequency
20
15
10
0
18.0 – 20.9 21.0 – 23.9 24.0 – 26.9 27.0 – 29.9 30.0 – 32.9 33.0 – 35.9 36.0 – 38.9
class interval
30
25
frequency
20
15
10
0
18.0 – 20.9 21.0 – 23.9 24.0 – 26.9 27.0 – 29.9 30.0 – 32.9 33.0 – 35.9 36.0 – 38.9
class interval
120
comulative frequency
100
80
60
40
20
0
18.0 – 20.9 21.0 – 23.9 24.0 – 26.9 27.0 – 29.9 30.0 – 32.9 33.0 – 35.9 36.0 – 38.9
class interval
26.67
25
23.33
realtive frequency
20 20
15
12.5
10
7.5
5 5 5
0
18.0 – 20.9 21.0 – 23.9 24.0 – 26.9 27.0 – 29.9 30.0 – 32.9 33.0 – 35.9 36.0 – 38.9
class interval
6 48
7 1258
8 012
(Xi) 2
For example
By Dr. HAMZE ALI ABDILLAHI
X2i = 1.22 +2.22 +6.42 +3.82 +0.92 = 62.49
210.25 . 79
cXi = cXi
For example
60Xi = 60Xi = 60×14.5 = 870.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 94
Another common operation is to subtract a
constant from each observed value, square each
difference, and add the results. In summation
notation, this is written as:
(Xi −c)2.
For example:
For example, suppose we want to
subtract 2.9 from each value, square
each of the results, and then sum these
squared differences. So c = 2.9, and
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 95
(Xi −c)2 = (1.2−2.9)2 +(2.2−2.9)2+· · ·+(0.9−2.9)2 = 20.44.
Basic Biostatistics
Measures of central tendency
Measures of central tendency
Mean
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 99
The mean is the sum of all the values
in a data set, divided by the number of
values. The mean of a whole
population is usually denoted by μ,
(called mu) while the mean of a
sample is usually denoted by
called x-bar).
To calculate the mean:
Sum up all the values.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 100
Divide the sum by the
number of values.
Result
Example 2
Data set is 4, 7, 5, 9, 5.
Calculate the mean?
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 103
Data set is 10, 12, 16,14.
Calculate the mean?
Result
4+7+5+9+5 M
= ---------------- = 6
5
Age fi mi mifi
15-19 11 17 187
20-24 36 22 792
25-29 28 27 756
30-34 13 32 416
35-39 7 37 259
40-44 3 42 126 Mean = 2630/100 = 26.3
45-49 2 47 94
Trimmed mean
Total 100 2630
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 106
It trims all but one or two values.
No specific amount of trimming is always
best, but 20% trimming is often a good
choice in the literature. This means that the
smallest 20%, as well as the largest 20%, are
trimmed and the average of the
remaining data is computed. Although there are
circumstances where this extreme amount of
trimming can be beneficial, but sometimes this
extreme amount of trimming can be detrimental.
Age fi Cum. F
5-14 5 5
15-24 10 15
25-34 20 35
35-44 22 57
45-54 13 70
55-64 5 75
2/26/2018
By Dr. HAMZE ALI ABDILLAHI 117
The mean versus the median
The mean is sensitive to outliers
The median is not sensitive to outliers
When the data are highly skewed, the
median is usually preferred
Example
Data values:
Ordered data : 1,1,3,3,4,5, 60
The mean is : 77/7 = 11
( n+1) 7+1
Median is = ------ ---- = 4 ( location )
Measures of dispersion
1. Range
2. Variation (SS) the sum of squared
deviation from the mean.
3. Variance (S2)
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 125
4. Standard deviation ( S )
5. Standard error ( SE )
6. Quartiles and inter quartile range
( QR )
7. Coefficient of variation ( CV )
Range
Is the difference between the maximum and the
minimum data values.
R = XL- XS, where XL = is the largest value and
XS = is the smallest value.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 126
It is the simplest measure and can be easily
understood. It takes into account only two values
which causes it to be a poor measureof
dispersion. One application is in quality control
charts, especially when small sample sizes are
involved.
For example:
data set: 4, 5, 6 , 7, 14
Standard deviation ( S )
It is the square root of variance. In variation,
the unit of measurement is in the squared
form. And when divided by (n-1) into
variance the unit is still in squared form.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 135
To bring back to the original unit of measurement,
the square root of the variance of the variance
must be obtained
The standard deviation (SD) quantifies
variability or scatter. Standard deviation
is a measure of precision of the population
distribution.
It is an indication of sample to
sample variation.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 141
2/26/2018
Coefficient variation
Detecting outliers
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 152
Quartiles
Values which divide the sorted data set into
four equal parts, so that each part represents
25% of the data.Quartiles are divided by the
25th percentile, 50th percentile, and 75th
percentile. One quarter of the values are less
Q2 = is the median.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 154
Q3 = gives the cut-point for the upper
25 % of the data set
Used of Quartiles
1. Qs and IQR are used in the construction of the box plot.
o CV %
Mean = 13, median = 13, Sd = 2.58
Ordered data = 10, 12, 14, and 16.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 158
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 159
Coefficient of variation (CV) o
Also known as relative variability.
o It is the measure of normalised dispersion.
o It is the ratio between measure of spread
and measure of location.
Example
Data set, 10, 12, 16, and 14.
Calculate the:
Coefficient of variation
Mean = 13, Sd = 2.58
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 162
Detecting outliers
Outliers are values that are unusually
large or small.
|100,000−20,002.5|
---------------------- = 1.897
42,162.38
The box plot rule
Box plot is another rule of outlier detection.
It is based on the fundamental strategy of
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 168
avoiding masking by replacing the mean and
standard deviation with measures of
location and dispersion that are relatively
insensitive to outliers.
This rule is based on the lower and upper
quartiles, as well as the inter-quartile range,
which provide resistance to outliers.
The box plot rule declares the value X an
outlier if
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 169
X < q1 −1.5 (q2 −q1) Or
X > q2 +1.5(q2 −q1)
For example:
Data values are:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,100,500.
The lower quartile is q1 = 4.417, the upper quartile is q2 =
12.583.
so q2 +1.5(q2 −q1) = 12.583+1.5(12.583−4.417) = 24.83.
Categorical Numerical
(Qualitative) (Quantitative)
Discrete Continuous
Samples
is costly
Cluster Samples
• Population divided into several “clusters,”
each representative of the population
• Simple random sample selected
from each Population
divided
into 4
clusters. 186
• The samples are combined into one
Chap1-165
188
Advantages
Low cost
Requires list of all clusters
Can estimate characteristics of both
cluster and population
Disadvantages
Non-probability Samples
Disadvantages
Quota Sampling
1. Select demographic characteristics of interest
(e.g. age, sex, ethnicity).
moderate cost
Very extensively used/understood
No need for list of population elements
Judgment sampling
Subjects chosen purposively on the basis of
having particular features
Used by specialists or authorities in a
specific area.
Most case studies are done in this manner.
2/26/2018 By Dr. HAMZE ALI ABDILLAHI 201
Sample size may not be large but an indepth
study of the cases is the main focus.
Also used when choosing controls for
epidemiological studies.
Useful for rare characteristics
Advantages
Moderate cost
Commonly used/understood
Sample will meet a specific objective
Disadvantages Bias
2/26/2018
Advantages
low cost
Useful in specific circumstances
Useful for locating rare
populations