8_1_categorical_data_ninell
8_1_categorical_data_ninell
Lecture 14
Empirical Methods 2 & Theory of Science
29.10.2024 2
Last time
• Recap Outliers
• Problems with Null Hypothesis
Significance Testing (NHST)
• Effect sizes (Cohen’s d)
• Correlation (Pearson’s r)
29.10.2024 3
Today :)
• 'Families’ of statistical relationships and
their associate tests and metrics
• Statistical Analysis of Categorical Variables
• Chi-Square Test
• Categorical effect sizes: Odds Ratio
29.10.2024 4
Today :)
• 'Families’ of statistical relationships and
their associate tests and metrics
• Statistical Analysis of Categorical Variables
• Chi-Square Test
• Categorical effect sizes: Odds Ratio
29.10.2024 5
*
Families of statistical relationships Predictor (Independent Variable):
The variable you manipulate or consider
Data Type Categorical Predictor Continuous Predictor as the cause or influencer. For example,
(Outcome) (two, three groups) (many “groups”) in an experiment studying the effect of
study hours on test scores, "study hours"
Categorical Chi-square test, Odds Ratio, NHST, Logistic regression, would be the predictor.
Outcome Fisher's exact discriminant analysis
Outcome (Dependent Variable):
Regression, Correlation The variable that changes in response to
Continuous the predictor. In the same example, "test
(Pearson/Spearman), Cohen’s d,
Outcome t-test, ANOVA scores" would be the outcome, as they
depend on the amount of study time.
29.10.2024 7
Today :)
• 'Families’ of statistical relationships and
their associate tests and metrics
• Statistical Analysis of Categorical Variables
• Chi-Square Test
• Categorical effect sizes: Odds Ratio
29.10.2024 9
Today :)
• 'Families’ of statistical relationships and
their associate tests and metrics
• Statistical Analysis of Categorical Variables
• Chi-Square Test
• Categorical effect sizes: Odds Ratio
29.10.2024 12
* Chi-Square Test
• Chi-Square Test: Origins and Applications
> Developed by Karl Pearson around 1900, the chi-square test examines
whether observed frequencies differ from expected values.
> This is the same Pearson known for, Pearson’s r.
> He was controversially known for promoting eugenics & scientific racism.
• Primary Uses of the Chi-Square Test
> Goodness of Fit: To determine if observed frequencies match an
expected distribution, e.g. determine if the color distribution of M&Ms in a
bag matches the company’s claimed proportions.
> Test of Independence: To assess whether two categorical variables are
independent or associated, e.g. examine if there is a relationship between
sex (male, female) and preference for a type of drink (tea, coffee)
29.10.2024 13
*
Calculation
Formula:
where
O = Observed value.
E = Expected value.
How do we calculate E?
29.10.2024 14
Meme Type Expected Pref. Observed Freq. Expected Freq. (O-E) (O-E)2 (O-E)2 / E
*
Requirements for Using Chi-Square
• Observations must be independent (e.g., Student A shouldn't
know or influence Student B’s survey responses).
• No cell frequency should be zero in the contingency table.
• At least 80% of cell frequencies should be greater than five
(some recommend ten as a minimum for accuracy).
• The total number of observations should ideally exceed 50
(at minimum, more than 20) to ensure reliable results.
Why? When you look at the distribution (previous slide), you see
that low df -> already large changes in p with small changes in 𝜒²
29.10.2024 19
And then you can do the same as before (get df, look up in a table) again :)
29.10.2024 21
Solution:
The chi-square statistic is 8.7514, with df = 3, the critical value is 7.815.
8.7514 > 7.815 so we reject the null hypothesis.
29.10.2024 23
Today :)
• 'Families’ of statistical relationships and
their associate tests and metrics
• Statistical Analysis of Categorical Variables
• Chi-Square Test
• Categorical effect sizes: Odds Ratio
29.10.2024 24
*
Effect Sizes of Categorical Variables — Odds Ratio
Definition: The odds ratio (OR) is a measure to determine the strength of
association between two categorical variables, commonly in the context of a
2x2 contingency table. It’s often used in studies looking at the association
between an exposure and an outcome, such as in medical or social science
research.
Veg+ Veg-
Odds (omni+, veg) = 122 / 32 = 3.81
Omni+ 122 32
Odds (omni-, veg) = 73 / 43 = 1.70
Odds ratio = 3.81 / 1.70 = 2.24 Omni- 73 43
> You are 2.24 more likely to like the omni food if you also like the veggie one.
29.10.2024 25
Thanks! :)