0% found this document useful (0 votes)

12 views26 pages

8_1_categorical_data_ninell

Uploaded by

Sophia Lindholm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views26 pages

8_1_categorical_data_ninell

Uploaded by

Sophia Lindholm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Chi-Square, Likelihood,

Effect sizes, and

Categorical Data Analysis

Lecture 14
Empirical Methods 2 & Theory of Science
29.10.2024 2

Last time
• Recap Outliers
• Problems with Null Hypothesis
Significance Testing (NHST)
• Effect sizes (Cohen’s d)
• Correlation (Pearson’s r)
29.10.2024 3

Today :)
• 'Families’ of statistical relationships and
their associate tests and metrics
• Statistical Analysis of Categorical Variables
• Chi-Square Test
• Categorical effect sizes: Odds Ratio
29.10.2024 4

Recap: Why Are There Different Statistical Tests?

Key Point: Different scenarios require different tests because of assumptions
about the data and what we are trying to compare.
• Assumptions: Tests make different assumptions about data (e.g.,
normality, equal variances, paired or independent samples).
• Data Types: Different tests are needed for continuous vs. categorical data,
or parametric vs. non-parametric situations.
• Number or Size of Groups: Tests vary depending on whether you're
comparing two groups (t-tests) or multiple groups (ANOVA).
29.10.2024 6

*
Families of statistical relationships Predictor (Independent Variable):
The variable you manipulate or consider
Data Type Categorical Predictor Continuous Predictor as the cause or influencer. For example,
(Outcome) (two, three groups) (many “groups”) in an experiment studying the effect of
study hours on test scores, "study hours"
Categorical Chi-square test, Odds Ratio, NHST, Logistic regression, would be the predictor.
Outcome Fisher's exact discriminant analysis
Outcome (Dependent Variable):
Regression, Correlation The variable that changes in response to
Continuous the predictor. In the same example, "test
(Pearson/Spearman), Cohen’s d,
Outcome t-test, ANOVA scores" would be the outcome, as they
depend on the amount of study time.
29.10.2024 7

Means of groups (value),

t-test (significance of means difference)
Cohen’s d (effect size)

r (degree of relatedness) Proportion of groups,

Linear regression (best line), Chi square (significance of groups),
r2 (effect of best line) Odd’s ratio (effect of grouping)
29.10.2024 8

What is Categorical Data? Discuss!

29.10.2024 10

What is Categorical Data?

• Categorical Designations Are All Around Us
> Many social labels, e.g. rich/poor, lucky/unlucky,
sympathetic/hostile are categorical in nature.
> Categorical analyses help us explore relationships
between these types of labels.
> The tests are different but the logic behind it is not new: it’s
an adaptation of the ones you have already met :)
• Characteristics of Categorical Variables
> Categorical variables have a limited set of values, unlike
continuous variables that can vary widely.
> Because categorical data doesn’t vary continuously, it
lacks a clear measure of variance.
29.10.2024 11

* Chi-Square Test
• Chi-Square Test: Origins and Applications
> Developed by Karl Pearson around 1900, the chi-square test examines
whether observed frequencies differ from expected values.
> This is the same Pearson known for, Pearson’s r.
> He was controversially known for promoting eugenics & scientific racism.
• Primary Uses of the Chi-Square Test
> Goodness of Fit: To determine if observed frequencies match an
expected distribution, e.g. determine if the color distribution of M&Ms in a
bag matches the company’s claimed proportions.
> Test of Independence: To assess whether two categorical variables are
independent or associated, e.g. examine if there is a relationship between
sex (male, female) and preference for a type of drink (tea, coffee)
29.10.2024 13

*
Calculation
Formula:
where
O = Observed value.
E = Expected value.
How do we calculate E?
29.10.2024 14

1. Expected vs. Observed Frequencies

A meme creator wants to check if the distribution of the types of memes shared on
social media matches their expected preferences based on previous data.
We calculate the expected frequencies usually ourselves!
> E.g. for 200 memes, we do: 200 x expected preference = expected frequency
E.g. Funny: 200 x 0.5 = 100, Relatable: 200 x 0.3 = 60, and so on.

Meme Type Expected Pref. Observed Freq. Expected Freq.

Funny 50% 120 100
Relatable 30% 50 60
Political 10% 20 20
Inspirational 10% 10 20
29.10.2024 15

1. Expected vs. Observed Frequencies

Now, we need to do the actual chi-square calculation:

Meme Type Expected Pref. Observed Freq. Expected Freq. (O-E) (O-E)2 (O-E)2 / E

Funny 50% 120 100 20 400 4

Relatable 30% 50 60 -10 100 0.6
Political 10% 20 20 0 0 0
Inspirational 10% 10 20 -10 100 0.2

1. Adding the values (because of ∑, the sum): 𝜒2 = 4 + 0.6 + 0 + 0.2 = 4.8

2. Determine degrees of freedom (df) = num. of outcomes -1 = 4 - 1 = 3
3. Look up the critical value for 3 df at our significance level (e.g. 0.05) in a Chi-Square
Table (google!). If your calculated 𝜒2 exceeds this critical value, you reject the null hypothesis,
suggesting that the observed distribution of meme types does not match the expected
distribution, if it’s lower, you fail to reject the null hypothesis :) (I get 7.81, so fail reject.)
29.10.2024 16

The Chi-Square Distribution

• The 𝜒² distribution is always positive because it
represents squared differences.
• The degrees of freedom (df) determine the “center”
or expected shape of the distribution.
• The shape of the 𝜒² curve depends on the degrees
of freedom—similar to other distributions.
• For large degrees of freedom, the 𝜒² distribution
starts to look more like the normal distribution.
• This is because a categorical variable with a large number of
categories begins to resemble a continuous variable.
29.10.2024 17
29.10.2024 18

*
Requirements for Using Chi-Square
• Observations must be independent (e.g., Student A shouldn't
know or influence Student B’s survey responses).
• No cell frequency should be zero in the contingency table.
• At least 80% of cell frequencies should be greater than five
(some recommend ten as a minimum for accuracy).
• The total number of observations should ideally exceed 50
(at minimum, more than 20) to ensure reliable results.
Why? When you look at the distribution (previous slide), you see
that low df -> already large changes in p with small changes in 𝜒²
29.10.2024 19

2. Test for Independence

Are two categorical variables related?
Examples:
> Are people who enjoy outdoor activities also likely to prefer eco-friendly products?
> Do people who like cheese also tend to like tomatoes?
> Are people with certain political beliefs also more likely to hold specific social views?
Procedure:
> Make contingency table to see observed frequencies for each combination of categories.
> Calculate expected values for each cell as if the variables were independent.
> Compare observed values to expected values to see if the diff. are statistically significant.
29.10.2024 20

2. Test for Independence — Calculation of Expected Value

A survey was conducted among two age groups—teens & adults—to see if there
is a relationship between meme preference (Image / Video) and age group.

Prefer Image Prefer Video Total

Teens 30 20 50 Ei,j = (Row Total) x (Column Total)

Adults 10 40 50 Overall Total
Total 40 60 100

Prefer Image Prefer Video Expected teens to prefer image:

(Expected) (Expected)
= 50 (teens) x 40 (image)
Teens 20 30 100 (overall)
Adults 20 30

And then you can do the same as before (get df, look up in a table) again :)
29.10.2024 21

Exercise: Food at KUA :)

We want to know if people who like the products at Wicked Rabbit (veggie)
also like those at Folkekøkken (omni).
Veg+ Veg- Total
We sample 270 random customers
and want to know if there is a relationship Omni+ 122 32 154

at the alpha = .05 level. Omni- 73 43 116

Calculate 𝜒²! Total 195 75 270

29.10.2024 22

Exercise: Food at KUA :)

1. Calculate expected values (see right). Veg+ Veg- Total

2. Calculate chi-square. Omni+ 122 (111.22) 32 (42.78) 154

3. Calculate the degrees of freedom. Omni- 73 (83.78) 43 (32.22) 116

4. Determine critical value in a table. Total 195 75 270

Solution:
The chi-square statistic is 8.7514, with df = 3, the critical value is 7.815.
8.7514 > 7.815 so we reject the null hypothesis.
29.10.2024 23

*
Effect Sizes of Categorical Variables — Odds Ratio
Definition: The odds ratio (OR) is a measure to determine the strength of
association between two categorical variables, commonly in the context of a
2x2 contingency table. It’s often used in studies looking at the association
between an exposure and an outcome, such as in medical or social science
research.
Veg+ Veg-
Odds (omni+, veg) = 122 / 32 = 3.81
Omni+ 122 32
Odds (omni-, veg) = 73 / 43 = 1.70
Odds ratio = 3.81 / 1.70 = 2.24 Omni- 73 43

> You are 2.24 more likely to like the omni food if you also like the veggie one.
29.10.2024 25

Effect Sizes of Categorical Variables — Odds Ratio

Alternatively, you calculate (A x D) / (B x C). Veg+ Veg-
> (122 x 73) / (32 x 43) = 2.24. Omni+ A B
Omni- C D
Interpreting Odds Ratios
● OR = 1: There is no association between exposure and outcome
(odds are the same in both groups).
● OR > 1: There is a positive association between exposure and
outcome (exposure is associated with higher odds of the outcome).
● OR < 1: There is a negative association between exposure and
outcome (exposure is associated with lower odds of the outcome).
29.10.2024 26

Thanks! :)

Gospel According To Matthew The Layman S Bible Commentary
No ratings yet
Gospel According To Matthew The Layman S Bible Commentary
161 pages
Chi Square Test
No ratings yet
Chi Square Test
22 pages
Chi Square Test
No ratings yet
Chi Square Test
13 pages
Lecture3 - Contingency Analysis
No ratings yet
Lecture3 - Contingency Analysis
16 pages
BS IMI U8 Oct23
No ratings yet
BS IMI U8 Oct23
100 pages
QM Lecture 10 - Chi Square Tests (1)
No ratings yet
QM Lecture 10 - Chi Square Tests (1)
48 pages
Maths report (2)
No ratings yet
Maths report (2)
15 pages
Mini Project Statistics)
100% (1)
Mini Project Statistics)
22 pages
0064ED90-5D9C-4A27-93B4-DBC9A22B0382
No ratings yet
0064ED90-5D9C-4A27-93B4-DBC9A22B0382
37 pages
Chisquare
No ratings yet
Chisquare
10 pages
Lecture 1 5th
No ratings yet
Lecture 1 5th
45 pages
Statistical Theory Lecture 5-2025
No ratings yet
Statistical Theory Lecture 5-2025
13 pages
Chi Square
No ratings yet
Chi Square
34 pages
10 Chi Square
No ratings yet
10 Chi Square
75 pages
7 Chi-Square and F
No ratings yet
7 Chi-Square and F
68 pages
Non-Parametric Analysis - 20241029 - 033906 - 0000
No ratings yet
Non-Parametric Analysis - 20241029 - 033906 - 0000
79 pages
10measures of Association
No ratings yet
10measures of Association
249 pages
Define the null hypothesis (no difference between sample and theoretical distribution) and the alternative hypothesis (difference exists).
No ratings yet
Define the null hypothesis (no difference between sample and theoretical distribution) and the alternative hypothesis (difference exists).
21 pages
chisquaretest
No ratings yet
chisquaretest
16 pages
Engineering Mathematics 2
No ratings yet
Engineering Mathematics 2
29 pages
Chi Square Method
No ratings yet
Chi Square Method
34 pages
Ermi Stat LL CH 4
No ratings yet
Ermi Stat LL CH 4
32 pages
Chapter 9 - Chi-Square Test
No ratings yet
Chapter 9 - Chi-Square Test
3 pages
Sardilla's Report On Advance Statistic
No ratings yet
Sardilla's Report On Advance Statistic
32 pages
Research Paper Using Chi Square Test
No ratings yet
Research Paper Using Chi Square Test
5 pages
Chapter11 Stats
No ratings yet
Chapter11 Stats
6 pages
1 - CA51018 - Chi Square - Introduction - Goodness of Fit Test - 2
No ratings yet
1 - CA51018 - Chi Square - Introduction - Goodness of Fit Test - 2
36 pages
When To Use Chi-Square? Sample Problems
No ratings yet
When To Use Chi-Square? Sample Problems
5 pages
T Test,ANOVA,Chi Square Test
No ratings yet
T Test,ANOVA,Chi Square Test
26 pages
Chi-Square Test: by Dr. M.Supriya Moderator:Dr.B.Aruna, M.D. (H)
No ratings yet
Chi-Square Test: by Dr. M.Supriya Moderator:Dr.B.Aruna, M.D. (H)
75 pages
Chi Square Goodness-of-Fit Tests
No ratings yet
Chi Square Goodness-of-Fit Tests
5 pages
Chi Square Test
No ratings yet
Chi Square Test
11 pages
PRP1001 JXH1003 Week 10 2024 No Notes
No ratings yet
PRP1001 JXH1003 Week 10 2024 No Notes
30 pages
Unit-4 Hypothesis Testing F T Z Chi Test
No ratings yet
Unit-4 Hypothesis Testing F T Z Chi Test
17 pages
Module 10
No ratings yet
Module 10
31 pages
Chi Square (KI Square) Test
No ratings yet
Chi Square (KI Square) Test
30 pages
CHAPTER FOUR (1)
No ratings yet
CHAPTER FOUR (1)
26 pages
Ch. 11 Student Notes
No ratings yet
Ch. 11 Student Notes
8 pages
Chi Square (Χ) : Yetty Dwi Lestari Department of Management, FEB Airlangga University
No ratings yet
Chi Square (Χ) : Yetty Dwi Lestari Department of Management, FEB Airlangga University
71 pages
Chi-Square by MPH
No ratings yet
Chi-Square by MPH
55 pages
6.3 Chi-Square (2)
No ratings yet
6.3 Chi-Square (2)
35 pages
Chi—Square Test
No ratings yet
Chi—Square Test
12 pages
Chi Square Test
No ratings yet
Chi Square Test
24 pages
Chi square
No ratings yet
Chi square
8 pages
Chi-Square Distribution
No ratings yet
Chi-Square Distribution
28 pages
Chi-Square_Test_Notes
No ratings yet
Chi-Square_Test_Notes
12 pages
Chi Square
No ratings yet
Chi Square
37 pages
Chapter12_X2 - Student(1)
No ratings yet
Chapter12_X2 - Student(1)
31 pages
CHI SQUARED
No ratings yet
CHI SQUARED
3 pages
Statistical Notes For Clinical Researchers: Chi-Squared Test and Fisher's Exact Test
No ratings yet
Statistical Notes For Clinical Researchers: Chi-Squared Test and Fisher's Exact Test
4 pages
Chapter 6. Chi-Square Test
No ratings yet
Chapter 6. Chi-Square Test
25 pages
Non-Parametric Tests
No ratings yet
Non-Parametric Tests
47 pages
Chapter 6
No ratings yet
Chapter 6
10 pages
Stat-213-Chapter-7-2
No ratings yet
Stat-213-Chapter-7-2
18 pages
Chi Square Test
No ratings yet
Chi Square Test
3 pages
Non Parametric Test
No ratings yet
Non Parametric Test
102 pages
Chi Square Test
100% (2)
Chi Square Test
75 pages
Chi Square Test
100% (1)
Chi Square Test
52 pages
Lecture Slides 3b - Bivariate Analysis Part 1
No ratings yet
Lecture Slides 3b - Bivariate Analysis Part 1
24 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Statistics Super Review
From Everand
Statistics Super Review
Statistics Study Guides
2/5 (1)
11B Recursive Functions and Back Tracking
No ratings yet
11B Recursive Functions and Back Tracking
25 pages
7B Arrays vs String & Programming With Loops and Arrays
No ratings yet
7B Arrays vs String & Programming With Loops and Arrays
51 pages
7A Recap of Programming in JavaScript
No ratings yet
7A Recap of Programming in JavaScript
45 pages
8B Recap of Array Programming With Loops and Map, Reduce, Filter
No ratings yet
8B Recap of Array Programming With Loops and Map, Reduce, Filter
49 pages
13A Preparation for Exam
No ratings yet
13A Preparation for Exam
16 pages
9A Complexity of Programs:Functions
No ratings yet
9A Complexity of Programs:Functions
40 pages
8A Programming Arrays With Map, Filter and Reduce
No ratings yet
8A Programming Arrays With Map, Filter and Reduce
51 pages
2 1 Ontology Ras
No ratings yet
2 1 Ontology Ras
20 pages
9B Gentle Introduction to Objects
No ratings yet
9B Gentle Introduction to Objects
19 pages
10B Tree Data Structure and Intro to Recursion
No ratings yet
10B Tree Data Structure and Intro to Recursion
23 pages
10A Implementing Dictionaries by Map
No ratings yet
10A Implementing Dictionaries by Map
17 pages
3 1 Paradigms in Communication Ras
No ratings yet
3 1 Paradigms in Communication Ras
20 pages
2 2 Teory in Science Ras
No ratings yet
2 2 Teory in Science Ras
23 pages
1 2 Scientific Paradigms
No ratings yet
1 2 Scientific Paradigms
19 pages
8_2_correlations+models_ninell
No ratings yet
8_2_correlations+models_ninell
44 pages
Lecture 7—Prototyping II
No ratings yet
Lecture 7—Prototyping II
37 pages
3_2_Positivism and postpositivism
No ratings yet
3_2_Positivism and postpositivism
17 pages
9_2_MultipleRegression
No ratings yet
9_2_MultipleRegression
71 pages
5 2 Standardization+Probability
No ratings yet
5 2 Standardization+Probability
43 pages
Math Worksheets+mock Test
No ratings yet
Math Worksheets+mock Test
9 pages
BaiTap K Map v4
No ratings yet
BaiTap K Map v4
15 pages
Yang Harus Mengumpulkan Berkas Kegiatan PPG 2019-1
No ratings yet
Yang Harus Mengumpulkan Berkas Kegiatan PPG 2019-1
12 pages
The Rank of A Matrix: Geometria Lingotto
No ratings yet
The Rank of A Matrix: Geometria Lingotto
3 pages
Perform Estimation and Basic Calculation
100% (2)
Perform Estimation and Basic Calculation
39 pages
My Hearts Thanksgiving Lyrics
No ratings yet
My Hearts Thanksgiving Lyrics
2 pages
Aniket Patil Cv..
No ratings yet
Aniket Patil Cv..
2 pages
Final Time Table BS English Fall 2024-1
No ratings yet
Final Time Table BS English Fall 2024-1
9 pages
Coding Theory and Applications (I) : A Quick Introduction To Coding Theory
No ratings yet
Coding Theory and Applications (I) : A Quick Introduction To Coding Theory
86 pages
The Amazing Facts Book of Bible Promises -- Amazing Facts -- ( WeLib.org )
No ratings yet
The Amazing Facts Book of Bible Promises -- Amazing Facts -- ( WeLib.org )
145 pages
Archaic Shrii
No ratings yet
Archaic Shrii
20 pages
Noli Me Tangere PPT2
0% (1)
Noli Me Tangere PPT2
12 pages
INGLÉS
No ratings yet
INGLÉS
9 pages
Thepari Journal: Archaeological and Epigraphic Studies in Pol Box, Quintana Roo
No ratings yet
Thepari Journal: Archaeological and Epigraphic Studies in Pol Box, Quintana Roo
16 pages
Treasures Grammar and Writing Handbook Gr 1 Teachers Edition Mcgraw-Hill [Mcgraw-Hill] - Download the ebook today to explore every detail
100% (1)
Treasures Grammar and Writing Handbook Gr 1 Teachers Edition Mcgraw-Hill [Mcgraw-Hill] - Download the ebook today to explore every detail
49 pages
Rubric Scoring For Reflection Paper
100% (2)
Rubric Scoring For Reflection Paper
1 page
Great FBO Application
No ratings yet
Great FBO Application
3 pages
Madhu Sudan (Designer. SPP&ID) 2 Yrs Experince CV
No ratings yet
Madhu Sudan (Designer. SPP&ID) 2 Yrs Experince CV
5 pages
Nobelang Titser Ni Arceo Liwayway LP
No ratings yet
Nobelang Titser Ni Arceo Liwayway LP
2 pages
The Body at The Front: Corporeity and Community in Jan Patočka's Heretical Essays in The Philosophy of History
No ratings yet
The Body at The Front: Corporeity and Community in Jan Patočka's Heretical Essays in The Philosophy of History
25 pages
Reviewing The Techniques
No ratings yet
Reviewing The Techniques
8 pages
Demo LP Final Na To!
No ratings yet
Demo LP Final Na To!
4 pages
Speech Act in The Great Gatsby Movie Script 2014-2015 By: Winda Ayu Citra Dewi IAIN Tulungagung
No ratings yet
Speech Act in The Great Gatsby Movie Script 2014-2015 By: Winda Ayu Citra Dewi IAIN Tulungagung
31 pages
An Investigation Into The Mispronunciation of English Made by Shina Speaking Teachers at Secondary Level in Gilgit Baltistan
No ratings yet
An Investigation Into The Mispronunciation of English Made by Shina Speaking Teachers at Secondary Level in Gilgit Baltistan
104 pages
Grade - 6 (PT - 2)
No ratings yet
Grade - 6 (PT - 2)
2 pages
Introduction To ADB'S Management Action Record System (Mars) and Lessons Database
No ratings yet
Introduction To ADB'S Management Action Record System (Mars) and Lessons Database
25 pages
Learning Plan Grade 7 (2nd Quarter)
No ratings yet
Learning Plan Grade 7 (2nd Quarter)
8 pages
Effective Communication Skills: Lesson 1.e
100% (1)
Effective Communication Skills: Lesson 1.e
25 pages
Mitcalc 1.74 X86/X64: Download at Maximum Speed and Remove 503 Error
No ratings yet
Mitcalc 1.74 X86/X64: Download at Maximum Speed and Remove 503 Error
4 pages

8_1_categorical_data_ninell

Uploaded by

8_1_categorical_data_ninell

Uploaded by

Chi-Square, Likelihood,

Effect sizes, and

Recap: Why Are There Different Statistical Tests?

Means of groups (value),

r (degree of relatedness) Proportion of groups,

What is Categorical Data? Discuss!

What is Categorical Data?

1. Expected vs. Observed Frequencies

Meme Type Expected Pref. Observed Freq. Expected Freq.

1. Expected vs. Observed Frequencies

Funny 50% 120 100 20 400 4

1. Adding the values (because of ∑, the sum): 𝜒2 = 4 + 0.6 + 0 + 0.2 = 4.8

The Chi-Square Distribution

2. Test for Independence

2. Test for Independence — Calculation of Expected Value

Prefer Image Prefer Video Total

Teens 30 20 50 Ei,j = (Row Total) x (Column Total)

Prefer Image Prefer Video Expected teens to prefer image:

Exercise: Food at KUA :)

at the alpha = .05 level. Omni- 73 43 116

Calculate 𝜒²! Total 195 75 270

Exercise: Food at KUA :)

2. Calculate chi-square. Omni+ 122 (111.22) 32 (42.78) 154

3. Calculate the degrees of freedom. Omni- 73 (83.78) 43 (32.22) 116

4. Determine critical value in a table. Total 195 75 270

Effect Sizes of Categorical Variables — Odds Ratio

You might also like