INTRODUCTION TO STATISTICS
INTRODUCTION
Definition of Statistics
Scope of Statistics
Kinds of Variables
Levels of Measurement
Types of Data
STATISTICS
Statistics is the science of collecting, organizing,
analyzing, and interpreting data in order to make
decisions.
Data consists of information coming from observations, counts,
measurements, or responses.
A population is the collection of all outcomes,
responses, measurement, or counts that are of interest.
A sample is a subset of a population.
Definition of Statistics
Collection
of numeric or
quantitative data.
Organization
X Houses are built of hollow
STATISTICS blocks, wood, etc
is the art and
science of
How many houses were
Presentation
built of hollow blocks
Proportion of houses
built of wood.
Analysis and
Interpretation
Parameter & Statistic
A parameter is a numerical measure that describes a
characteristic of a population.
A statistic is a numerical measure that describes a
characteristic of a sample.
Parameter Population
Statistic Sample
Parameter & Statistic
EXAMPLES
Parameter
The population mean of the electricity bills of the
residents of a certain city is Php 1500.00
Statistic
The sample mean of the electricity bills of 20
residents of a certain city is Php 1500.00
Parameter & Statistic
A sociologist wants to the proportion of adults with children
under the age of 18 that eat dinner together 7 nights a week. A
simple random sample of 1122 adults with children under the age
of 18 was obtained, and 337 of those adults reported eating
dinner together with their families 7 nights a week.
Parameter
The proportion of adults with kids under 18 who ate together
7 nights a week.
Statistic
337/1122 = 0.300, the proportion in the sample who ate
together.
Parameter & Statistic
An education official wants to estimate the proportion of adults
aged 18 or older who had read at least one book during the
previous year. A random sample of 1006 adults aged 18 or older
is obtained, and 835 of those adults had read at least one book
during the previous year.
Parameter
The proportion of adults 18 or older who read a book in the
previous year.
Statistic
835/1006 = 0.830, the proportion who read a book in the
sample.
Scope of Statistics
The study of statistics has two major branches:
descriptive statistics and inferential statistics.
Statistics
Descriptive Inferential
statistics statistics
Involves the Involves using a
organization, sample to draw
summarization, conclusions about
and display of data. a population.
Scope of Statistics
Survey
Observation
Gathering
Use of Existing records
Experimental
Raw Data
Classification Array
(Data Organization) Frequency Distribution Table
DESCRIPTIVE Single Value Grouping
STATISTICS
Textual
Presentation Tabular
Graphical
Central Tendency
Collection of
Summarizing Variability
values (computing Percentages/Ratio/Proportions
measures) Others (Quantiles / Fractiles)
Descriptive Statistics
It describes the important characteristics/
properties of the data using the measures the
central tendency like mean/ median/mode and
the measures of dispersion like range, standard
deviation, variance etc.
Data can be summarized and represented in an
accurate way using charts, tables and graphs.
Descriptive Statistics
Example:
We have marks of 1000 students and we may be
interested in the overall performance of those students
and the distribution as well as the spread of marks.
Descriptive statistics provides us the tools to define
our data in a most understandable and appropriate
way.
Scope of Statistics
INFERENTIAL STATISTICS
Method or technique using small portion of the total
set of data in order to draw conclusions or
judgments regarding the entire set.
Scope of Statistics
Statistical Inference
Predict life span of bulbs Probability
theory
Compare effectiveness of two
reducing diets Risks/odds
methods
VARIABLES
A variable is a characteristic of a unit of
observation or subject that can take on
different values for different units/subjects
or for the same unit/subject at different
periods.
About Variable
HEIGHT
Variable
Small Medium Tall
Attributes/
characteristic 5’6
5’2 5’8
Kinds of Variables
Qualitative Variable
A qualitative variable takes on
non-numerical values.
It simply describes which class or category
the observations fall, thus also known as
categorical data.
Kinds of Variables
Qualitative Variables
Sex
Hair color: Black, Blonde, Brown
Religion: Catholic, Protestants,
INC
Occupation: Teacher, Doctor,
Engineer
Male Female
Nationality: Filipino, American,
Hispanic
Kinds of Variables
Quantitative Variable
A quantitative variable may take any
value from a given set of values. It
has actual units of measure
Height
Family
Ages Size
Kinds of Variables
Quantitative Variables
Discrete
Number of overweight persons
0, 1, 2, 3 ….
Continous
Weight in kilograms
65.6 kg, 55.34 kg, 100 kg, ¾ kg . . .
Exercises:
For the following statements, decide whether it
belongs to the field of descriptive statistics or
inferential statistics.
1. A badminton player wants to know
his average score for the past 10
games.
Answer:
Descriptive Statistics
Exercises:
For the following statements, decide whether it
belongs to the field of descriptive statistics or
inferential statistics.
2. A car manufacturer wishes to estimate the
average lifetime of batteries by testing a
sample of 50 batteries.
Answer:
Inferential Statistics
Exercises:
For the following statements, decide whether it
belongs to the field of descriptive statistics or
inferential statistics.
3. Janine wants to determine the variability of her six
exam scores in Physics
Answer:
Descriptive Statistics
Exercises:
For the following statements, decide whether it
belongs to the field of descriptive statistics or
inferential statistics.
4. A shipping company wishes to estimate the number of
passengers traveling via their ships next year using their
data on the number of passengers in the past three years.
Answer:
Inferential Statistics
Exercises:
For the following statements, decide whether it
belongs to the field of descriptive statistics or
inferential statistics.
5. A politician wants to determine the total number
of votes his rival obtained in the past election based
on his copies of the tally sheet of electoral returns.
Answer:
Descriptive Statistics
Levels of Measurement
MEASUREMENT
is a set of rules for assigning
numbers to attributes of observations.
It is structured in such way that the
existing relationship between the
observations is preserved in the
numbers assigned to them.
About Measurement
Levels of Measurement
Nominal
Ordinal
Interval
Ratio
Levels of Measurement
Nominal scale
Is the simplest scale of
measurement where a value or unit
of data is assigned to one of at
least two qualitative classes or
categories.
Levels of Measurement
SEX
MALE FEMALE MALE
1 2 1
RULE: Identification
LEVEL: Nominal
Levels of Measurement
The psychiatric system of a NJB020401, NUU112900
diagnostic groups
Schizophrenic
28 16
Paranoid
Manic-depressive
Psychoneurotic
jersey numbers
Employment classification
1 - Educator
2 – Construction worker
NJB020
NUU129 3 – Manufacturing worker
4 – Lawyer
Automobile license plates 5 – Doctor
6 - Others
Levels of Measurement
Nominal scale
Conditions:
1. Exhaustive – every value or unit
of data can be assigned to a
category.
2. Mutually exclusive – it is not
possible to assign a value to more
than one category because the
categories do not overlap.
Levels of Measurement
Ordinal scale
It involves placement of values or
codes in some rank order to create
an ordinal scale variable.
The relationship between observations
takes on the form of “greater than” and
“less than” or “higher than” and “lower
than”
Levels of Measurement
EDUCATION
ELEM HS COLLEGE
1 2 3
Rule: MAGNITUDE
Level: ORDINAL
Levels of Measurement
Nominal Ordinal
qualitative qualitative
Exhaustive/Mutually Exhaustive/Mutually
exclusive exclusive
equal in value ranked
TeamA=TeamB=TeamC
1st place >2nd place >3RD
Levels of Measurement
Social
Class Academic Grades
Lower
Upper Middle
A, B, C, D, E, F
Quality of
Intensity
Service
of attitude
Strongly agree, agree, (5) Excellent
neutral, disagree, (4) Very Satisfactory
strongly disagree (3) Satisfactory
(2) Needs Improvement
(1) Poor
Levels of Measurement
Interval scale
Assigning of numbers to observations is
based not only on the order to which they
possess a certain attribute but also indicates
exactly how much they posses the attributes.
In this measurement we can determine how
many units’ difference there are from one
rank to the next.
Levels of Measurement
GRADES IN STATISTICS
80 85 90
80 85 90
Rule: INTERVAL
Level: INTERVAL
Levels of Measurement
Interval scale
zero point has no meaning
Example:
Celsius -18 0 10 30 100
Fahrenheit 0 32 50 86 212
Levels of Measurement
Ratio Measurement
Has all the features of an interval scale.
Requires on absolute, fixed and
non-arbitrary zero point.
Ratio of two numbers is meaningful
Levels of Measurement
WEEKLY INCOME
P 2,000 P 2,500 P0
2,000 2,500 0
Rule: ABSOLUTE ZERO
Level: RATIO
Levels of Measurement
Ratio Measurement
Weight 2 kg, 40 lbs,
ounce 90 80
85 Age
HEIGHT TIME
VOLUME
Years of school completed Per capita GNP
Number of children born Weeks of unemployment
Years in present job
Travel time to work (minutes)
Levels of Measurement
COMPARATIVE SUMMARY
Ratio
Absolute zero
Interval
Ordinal Distance bet.
attributes is
Nominal meaningful
Attributes can
be ordered
Attributes are
labels only
Levels of Measurement
IMPORTANCE OF UNDERSTANDING THE
LEVELS OF MEASUREMENT
1. Helps you decide how to interpret
the data.
2. Helps you decide what statistical
analysis is appropriate on the
values that were assigned.
Types of Data
2 Types
PRIMARY DATA
SECONDARY DATA
Types of Data
PRIMARY DATA
Any set of data or information that
are directly collected from the source
(informants or respondents or records).
Government statistical agencies are
given the responsibility to collect,
publish and disseminate statistical
series.
Types of Data
SECONDARY DATA
Data are provided directly by an
organization or government agency in
convenient form such as written report.
Data that are processed and re-processed
by individuals or entities from sources other
than the primary source of information.
DATA
CLASSIFICATION
Types of Data
Data sets can consist of two types of data:
qualitative data and quantitative data.
Data
Qualitative Quantitative
Data Data
Consists of Consists of
attributes, labels, numerical
or nonnumerical measurements or
entries. counts.
Qualitative and Quantitative Data
Example:
The grade point averages of five students are listed in the table. Which data are qualitative data and which are quantitative data?
Student GPA
Sally 3.22
Bob 3.98
Cindy 2.75
Mark 2.24
Kathy 3.84
Qualitative data Quantitative data
Qualitative and Quantitative Data
I identify which represent qualitative variables, which
represent quantitative variables
1. hair color
2. height weight
3. time in the 100 yard dash
4. religion
5. number of items sold to a shopper
6. political party, profession
Levels of Measurement
The level of measurement determines which statistical
calculations are meaningful. The four levels of
measurement are: nominal, ordinal, interval, and ratio.
Nominal
Levels Lowest
Ordinal to
of
Measurement Interval highest
Ratio
Nominal Level of Measurement
Data at the nominal level of measurement are
qualitative only.
Nominal
Levels Calculated using names,
of labels, or qualities. No
Measurement mathematical computations
can be made at this level.
Colors Names of Textbooks you
in the students in your are using this
US flag class semester
Ordinal Level of Measurement
Data at the ordinal level of measurement are qualitative
or quantitative.
Levels
of Ordinal
Measurement Arranged in order, but
differences between data
entries are not meaningful.
Class standings: Numbers on the Top 50 songs
freshman, back of each played on the
sophomore, player’s shirt radio
junior, senior
Interval Level of Measurement
Data at the interval level of measurement are
quantitative. A zero entry simply represents a position
on a scale; the entry is not an inherent zero.
Levels
of
Measurement Interval
Arranged in order, the differences
between data entries can be
calculated.
Temperatures Years on a Atlanta Braves
timeline World Series
victories
Ratio Level of Measurement
Data at the ratio level of measurement are similar to the
interval level, but a zero entry is meaningful.
A ratio of two data values can be
Levels
formed so one data value can be
of
expressed as a ratio.
Measurement
Ratio
Ages Grade point Weights
averages
Summary of Levels of Measurement
Arrang Determine if
Put data
Level of e data Subtract one data value
in
measurement in data values is a multiple of
categories
order another
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
STATISTICS APPLIED
TO RESEARCH
SAMPLING DESIGN: Basic
Concepts and Procedure
The goal in sampling is to obtain
individuals for a study in such a way that
ac c urate i nf o rm a t i o n abo ut the
population can be obtained.
Reason for Sampling
• Important that the individuals included
in a sample represent a cross section of
individuals in the population.
• If sample is not representative it is
biased -- you cannot generalize to the
population from your statistical data.
Definitio
n:
Sampling technique/Sampling
Strategies
It is a plan you set forth to be sure that the
sample you use in your research study
represents the population from which you
drew your sample.
Definitio
n:
Sampling Frame
This is the list of the elements in your
population and from this your sample is drawn.
Sampling Bias
This involves problems in your sampling,
which reveals that your sample is not
representative of your population.
Selection Bias
1. Deliberately or purposively selecting a
“representative” sample.
2. Mis specifying the target population.
3. Failing to include all of the target population in
the sampling frame, called under coverage.
4. Including population units in the sampling
frame that are not in the target population,
called over coverage.
Selection Bias
5. Having multiplicity of listings in the sampling
frame.
6. Substituting a convenient member of a population
for a designated member who is not readily
available.
7. Failing to obtain responses from all of the chosen
sample. (Nonresponse)
8. Allowing the sample to consist entirely of
volunteers.
Advantage of Sampling Over Complete Enumeration
Less Labor Reduced Cost Greater Speed
Greater Scope
Greater Efficiency and Accuracy
Convenience
Ethical Considerations
Two Type of Samples
1. Probability Sample
2. Non -
Probability Sample
Probability Samples
• Samples obtained using some
are objectivechance mechanism, thus
involving randomization.
•They require the use of a complete
listing of the elements of the universe
called the sampling frame.
Probability Samples
• The probabilities of selection are known.
• They are generally referred to as random
samples.
• They allow drawing of valid generalizations
about the universe/ population.
Non-Probability Samples
• Samples are obtained haphazardly,
selected purposively or are taken as
volunteers.
• The probabilities of selection are
unknown.
• They should not be used for statistical
inference.
Sampling Procedure
Identify the population.
Determine if population is accessible.
Select a sampling method.
Choose a sample that is representative of
the population.
Ask the question, can I generalize to the
general population from the accessible
population?
Basic Sampling Technique of
Probability Sampling
• Simple Random Sampling
• Systematic Random
Sampling
• Stratified Random Sampling
• Cluster Sampling
• Multi-stage Sampling
Simple Random Sampling
• Most basic method of drawing
a probability sample
• Assigns equal probabilities of selection to
each possible sample
• Results to a simple random sample
Simple Random Sampling
Advantage:
It is very simple and easy to use.
Disadvantage:
Difficulty of gaining access to a list of a larger
population, time consuming and expensive.
When to Use:
This is preferable to use if the population is not
widely spread geographically. Also, this is more
appropriate to use if the population is more or less
homogenous with respect to the characteristics of
the population.
Systematic Random Sampling
• It is obtained by selecting every kth
individual from the population.
• The first individual selected corresponds
to a random number between 1 to k.
Systematic Random Sampling
Obtaining a Systematic Random Sample
Advantage:
Drawing of the sample is easy. It is easy to
administer in the field, and the sample is spread
evenly over the population.
Disadvantage:
May give poor precision when unsuspected
periodicity is present in the population.
When to Use:
This is advisable to us if the ordering of the
population is essentially random and when
stratification with numerous data is used.
Example:
We want to select a sample of 50
students from 500 students under this
method kth item and picked up from the
sampling frame.
Solution:
We start to get a sample starting form i and for
every kth unit subsequently. Suppose the
random number i is 6, then we select 15, 25, 35,
45, .. .
Stratified Random Sampling
• It is obtained by separating the population
into non-overlapping groups called strata
and then obtaining a simple random
sample from each stratum.
• The individuals within each stratum
should be homogeneous (or similar) in
some way.
Stratified Random Sampling
Advantage:
The selection of units using a stratified procedure
adds greater precision because it improves the
potential for the units to be more evenly spread
over the population.
Disadvantage:
Values of the stratification variable may not be
easily available for all units in the population
especially if the characteristic of interest
homogeneous. It is possible that there
is are not
representative in one or two strata. Also,
transportation costs can be high if the population
c o v e r s a w i d e g e ographic area.
P oly tec hn ic Univ ersi ty of th eP hilip pin es
When to Use:
We need to have information in the sampling
frame that can be used to form the strata. For
each group, we need to know how many and
which members of the population belong to that
group. When such information is available, it is
easy to use stratified random sampling.
Example:
A sample of 50 students is to be drawn
from a population consisting of 500
students belonging to two institutions A
and B. The number of students in the
institution A is 200 and the institution B
is 300. How will you draw the sample
using proportional allocation?
Solution:
There are two strata in this case.
Given: N1 = 200
N2 = 300 N = 500 n = 50
If n1 and n2 are the sample size,
n 50
n1 = N1 = 200 = 20
(N) ( 500 )
n2 = n N2 = 50 300 = 30
( N20)from A(and
The sample sizes are 50030) from B. Then the units
from each institution are to be selected by simple random
sampling.
Cluster Sampling
• You take the sample from naturally
occurring groups in your population.
• The clusters are constructed such that the
sampling units are heterogeneous within
the cluster and homogeneous among the
clusters.
Cluster Sampling
Obtaining a Cluster Sample
1.Divide the population into non-overlapping clusters.
2.Number the clusters in the population from 1 to N.
3.Select n distinct numbers from 1 to N using a
randomization mechanism. The selected clusters are the
clusters associated with the selected numbers.
4.The sample will consist of all the elements in the
selected clusters.
Advantage:
There is no need to come out with a list of units in the
population; all what is needed is simply a list of the
clusters. It is also less costly since the elements are
physically closer together.
Disadvantage:
In actual field applications, adjacent households tend to
have more similar characteristics than households
distantly apart.
When to Use:
If the population can be grouped into clusters where
individual population elements are known to be
different with respect to the characteristics under study,
this preferable to used.
Example:
A researcher wants to survey academic performance of
high school students in MIMAROPA.
1.He / She can divide the entire population into different clusters
(Mindoro, Marinduque, Romblon, and Palawan). There are 4 clusters.
2. Then the researcher selects a number of clusters depending on
his research through simple or systematic random sampling.
3. Then, from the selected clusters the researcher can either
include all the high school students as subject or he can select a
number of subjects from each cluster.
Multi - Stage Sampling
Selection of the sample is done in two or
more steps or stages, with sampling units
varying in each stage.
Multi - Stage Sampling
Obtaining a Multi-Stage
Sampling
1.Organize the sampling process into stages
where the unit of analysis is systematically
grouped.
2.Select a sampling technique for each
stage.
3.Systematically apply the sampling
technique to each stage until the unit of
Advantage:
Transportation costs are greatly reduced since there is
some form of clustering among the ultimate or final
samples; i.e., they are in the sample lower-stage units.
Disadvantage:
Due to the fact that multi-stage sampling cuts out
portions of the population from the study, the study’s
findings can never be 100 percent representative of the
population.
When to Use:
If the population covers a wide area.
Example:
https://research-methodology.net/sampling-in-primary-data-collection/multi-stage-sampling/
Example:
A researcher wants to survey academic performance of high
school students in MIMAROPA.
1.He/She can divide the entire population into different clusters
(Mindoro, Marinduque, Romblon, and Palawan). There are 4
clusters.
2.Then the researcher selects a number of clusters depending
on his research through simple or systematic random sampling.
3.Then, from the selected clusters the researcher can either
include all the high school students as subject or he can select a
number of subjects from each cluster.
References:
http://www.economicsdiscussion.net/statistics/sampling/
advantages-of-sampling-over-complete-enumeration-in-
statistics/11980
http://www.natco1.org/research/files/SamplingStrategies.pdf
https://data36.com/statistical-bias-types-explained/
Statistics. Informed Decision using Data by Michael Sullivan, III,.
Fifth Edition
Sampling: Design andAnalysis by Sharon L. Lhr. Second
Edition
UP Mathematics – Ms. Katrina D. Elizon