0% found this document useful (0 votes)
100 views

Nonparametric Testing Using The Chi-Square Distribution: Reading Tips

This document discusses using chi-square tests to analyze nominal data from studies. It provides definitions of key terms like nominal data and nonparametric tests. The summary explains how to calculate expected frequencies, the chi-square value, degrees of freedom, and determine if the chi-square value indicates a statistically significant difference between observed and expected results. An example is provided of analyzing developmental abnormality rates between healthy and high-risk infants using a 2x2 contingency table. The document also discusses correcting for continuity, combining categories, and using chi-square to test for independence between variables.

Uploaded by

Encik Smkba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views

Nonparametric Testing Using The Chi-Square Distribution: Reading Tips

This document discusses using chi-square tests to analyze nominal data from studies. It provides definitions of key terms like nominal data and nonparametric tests. The summary explains how to calculate expected frequencies, the chi-square value, degrees of freedom, and determine if the chi-square value indicates a statistically significant difference between observed and expected results. An example is provided of analyzing developmental abnormality rates between healthy and high-risk infants using a 2x2 contingency table. The document also discusses correcting for continuity, combining categories, and using chi-square to test for independence between variables.

Uploaded by

Encik Smkba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

reading tips

Nonparametric Testing Using the


Chi-Square Distribution
As physical therapists, we evaluate and treat people with a
wide variety of problems and a wide variety of people with
similar problems. Numerous tests and treatment procedures
have been devised; unfortunately, insufficient effort has gone
into examining the results of our work. Do we know which
hip replacement treatment protocol achieves the best results?
Is there a difference in functional recovery of men versus
women with similar spinal cord injuries? Do high-risk infants
have a greater incidence of developmental problems than
healthy infants? Numerous ways exist to study and analyze
these questions. This paper will deal with one relatively simple
method of analyzing the data generated from studies designed
to answer these kinds of questions. We will describe a nonparametric statistical analysis using the chi-square ( 2) distri
bution.
COMPARING THE DISTRIBUTIONS OF A
NOMINAL VARIABLE
Imagine for a moment that you are a pediatric physical
therapist working in a major medical center. You do devel
opmental screening of infants in the well-baby clinic and work
with infants considered to be at high risk for developing
developmental abnormalities later in life. You are interested
in seeing if the high-risk infants who survive have a higher
incidence of developmental abnormalities later in life than
the healthy infants. To answer this question, you analyze the
records of 30 former high-risk infants and 30 former healthy
infants and check how many have developmental abnormal
ities at 3 years of age. Your data could be summarized in a
table as follows:
abnormalities

a
healthy infants

b
5

c
high-risk infants

no abnormalities

25
d

20

10

This 2 2 contingency table is the usual method of


presenting data being analyzed with the chi-square statistic.
(The Appendix provides the specifics of constructing a contingency table.) The data we have generated are termed "nominal
data"; that is, the data can be categorized into frequency
counts in mutually exclusive categories.1-3
Basically, when you apply the chi-square statistic, you are
looking at the observed frequencies of each cell (5, 25, 20, 10
in the example) and comparing them with the expected or
theoretical frequencies that we would expect to obtain if the
null hypothesis was true. The null hypothesis in this example
would be that no difference exists in the incidence of abnormalities between the high-risk infants and the healthy infants.
264

DEFINITIONS
Before we explain how to perform the analysis, we will give
some basic information about the chi-square test. The chisquare test is a nonparametric statistical test. Nonparametric
or distribution-free methods of analysis, as opposed to parametric tests, do not assume a normal distribution of the
population from which the samples are drawn. Nonparametric tests are used with ordinal and nominal data. Nonparametric methods can be used to analyze studies that have
collected interval or ratio data. These data, however, will have
to be reduced to nominal or ordinal data.2,4,5
The chi-square statistic gives a measure of the discrepancies
between the observed and expected frequencies (nominal
data). If the discrepancies are large, the chi-square value will
be large. If no discrepancies exist, the chi-square value will be
0. Negative chi-square values do not occur (as you will see
later) because the numerator in the chi-square formula always
is squared, which eliminates the possibility of a negative
number occurring. This test is used when you want to know
if the differences between the observed and expected frequencies are significant. Although chi square can be used to analyze
studies that collect ratio data, that data will have to be reduced
to frequency data for analysis. The test usually is used with
discontinuous frequency data classified into mutually exclusive categories (eg, pass-fail, yes-no). Most commonly, experimental studies with nonparametric hypotheses have independent groups of subjects being measured on some qualitative dependent variable.
ANALYSIS AND INTERPRETATION
If we are to compare the observed frequencies (fo) with the
expected frequencies (fe) under the null hypothesis, we need
to calculate the expected frequencies. This is a simple process.
First find the row (r) and column (c) totals and the total
number of observations (subjects). The expected frequencies
for each cell are calculated by multiplying the r sum that
contains the cell by the c sum that contains the cell and
dividing by the total number of subjects (N). In our example
Cell a: Fe =

= 12.5

(1)

Cell b: Fe =

= 17.5

(2)

Cell c: Fe =

= 12.5

(3)

Cell d: Fe =

= 17.5

(4)

PHYSICAL THERAPY

Our table now reads


abnormalities

healthy infants

high-risk infants
column sum

procedure can be done simply by subtracting .5 from the


absolute difference between fo and fe before the difference is
squared:
( | fo - fe | - 0.5)2
(6)
Our corrected chi-square value would be calculated as follows:

no abnormalities

fo = 5

fo = 25

fe = 12.5

fe = 17.5

c
fo = 20

d
fo = 10

fe = 12.5

fe = 17.5

25

row sum

35

30

30

2 = 3.92 + 2.8 + 3.92 + 2.8

60

With this information, we now can apply the formula used to


calculate the chi-square value:
(5)
where O denotes observed and E denotes expected frequen
cies.6-10 The formula instructs us simply to take the observed
and expected values in a cell, subtract the E from the O,
square that value, and divide by E. Last, we take each of the
cell totals and add them together. In our example, we calculate
chi square as follows:

2 = 13.44
In our example, the correction for continuity reduced our
calculated chi-square value from 15.4 to 13.44.

2 2 Special Formula
For the 2 2 table only, a special alternative formula can
be used that makes the calculation easier. The uncorrected
formula is
(7)

X2 = 4.5 + 3.2 + 4.5 + 3.2


2 = 15.4
The next step is to determine if our calculated chi-square
value is statistically significant. For the moment, let us assume
it is statistically significant. What does that tell us? We would
be able to say that the observed frequencies in our study differ
enough from the calculated expected frequencies that we can
reject our null hypothesis that no difference exists in the
incidence of abnormalities between the high-risk and healthy
groups. This gives support to the alternative hypothesis that
the high-risk infants will have a greater incidence of develop
mental abnormalities later in life than will the healthy infants.
In fact, our chi-square value is statistically significant at the
p < .05 level. To determine if chi square is large enough to
reject the null hypothesis, we take our chi-square value and
appropriate degrees of freedom (df) to a table of critical values
of chi square. The formula for degrees of freedom is df =
(r - l)(c - 1). In our example, df = (2 - 1)(2 - 1) = 1. For a
given level of significance and the appropriate degrees of
freedom, if our chi square is larger than or equal to the value
on the table of critical values, we conclude that we have
statistical significance. In our example, for one degree of
freedom at the .05 level of significance we need a chi-square
value equal to or greater than 3.84. Because our chi-square
value was 15.4, we can conclude that a significant difference
exists. To report the results of a chi-square analysis, it is
common practice to report the chi-square value, the degrees
of freedom, and the significance level. For our example, we
report the analysis as 2 = 15.40, df= 1, p < .05.
Some researchers and statisticians recommend you use a
correction for continuity to decrease the likelihood of error
in estimating probabilities when you have small expected
frequencies (some say when fe < 5, some say when fe < 10,
some say always, and some say never). This correction reduces
the chi-square value, which is done by reducing by .5 obtained
frequencies that are greater than expectation and increasing
by .5 obtained frequencies that are less than expectation. This
266

Note that the numerator AD - BC is the difference between


the two cross products and the denominator is the product of
the four marginal totals. The corrected formula would be
(8)
Using this formula for our examples
abnormalities

a
healthy infants

c
high-risk infants
column sums

no abnormalities
b

row sum

25

30

10

30

d
20
25

35

60

2 = 13.44, the exact answer we obtained


from the corrected formula.

COMBINING CATEGORIES
If your study has a number of categories that have small
observed frequencies, it is advisable to combine categories,
when possible, to increase the number of observations in each
cell. For example, if in your study two out of six categories
are similar and have very small observed frequencies, com
bining the two categories into one will give you a larger
frequency count.
The problem with small expected frequencies is that you
increase the likelihood of making an error in estimating
probabilities from the theoretical frequency curve used to
determine critical significant values. Thus, a good policy is to
combine categories with small observed frequencies when
possible.1.2
PHYSICAL THERAPY

COMPARING VARIABLES FOR INDEPENDENCE


The chi-square statistic can be used for more than comparing the distributions of a nominal variable. The chi square
also can be used for tests of independence.2,7
Suppose you are a physical therapy education program
director. You want to know if a relationship exists between
the academic grades your students earn and their ratings on
clinical performance or if the two variables are independent
of each other. To test the hypothesis that clinical performance
is independent of academic performance, you place the last
100 graduated students into the following categories: 1) academic performanceabove median and below median and
2) clinical performancebelow average, average, above average. You can summarize the data in a 2 3 contingency
table as follows:
Clinical Performance
below
above
average
average
average
Academic
Performance

row sums

above
median

11

25

35

71

below
median

15

29

26

32

42

100

column sums

total

In tests of independence, two nominal variables can be


compared at one time. Once the observed frequencies have
been tabulated, the expected frequencies can be calculated in
the same way we described earlier. For our example, the table
with both expected and observed frequencies would be the
following:
Clinical Performance
below
above
average
average
average
Academic
Performance

above
median

fo = 25
fo = 35
f o = 11
fe = 25.56 fe = 22.72 fe = 29.82

below
median

fo = 7
fo = 7
fo = 15
fe = 10.44 fe = 9.28 fe = 12.18

29

32

100

column sums

36

42

row sums

71

total

The value of chi square also is calculated in this manner. The


calculated chi square in this example is 14.287.
If the two variables being tested were independent, we
would expect the analysis to yield a small chi-square value
that was not statistically significant. (This expectation is because the observed differences were small and were attributed
to random occurrences.) In our example, the chi-square value
is large and statistically significant. Therefore, we conclude
that academic performance and clinical performance are not
independent and, indeed, an association exists between the
two variables.
CORRELATION COEFFICIENTS
On occasion, a nonparametric correlation coefficient (similar to the parametric Pearson product-moment correlation
coefficient) is needed to describe the degree of association
between variables in a contingency table.2 The appropriate
correlation coefficient for a 2 2 table is the phi () coefficient. The equation for phi is as follows:
(9)
The appropriate correlation coefficient for tables in which
r, c, or both are greater than 2 is the contingency coefficient
(C) calculated from the following formula:
contingency coefficient =

(10)

where N is the total number of subjects.


Volume 66 / Number 2, February 1986

267

The minimum value of phi and the contingency coefficient


is 0. The maximum value depends on the number of rows
and columns. In a 2 2 table the maximum value is .816.
The formula to calculate maximum values in tables when the
number of rows is equal to the number of columns is as
follows:
(11)
where k is the number of rows or columns. Contingency
coefficients can be compared with other contingency coefficients only if they have been calculated from tables containing
the same number of rows and columns.
In our example, we have found that an association exists
between academic performance and clinical performance.
The degree of that association can be computed as follows:

REFERENCES
1. Michels E: Design of Research and Analysis of Data in the Clinic: An
Introductory Manual for Clinical Research. Alexandria, VA, American Physical Therapy Association, 1982
2. Ferguson GA: Statistical Analysis in Psychology and Education, ed 4. New
York, NY, McGraw-Hill Inc, 1976
3. Maxwell AE: Analyzing Qualitative Data. London, England, Chapman and
Hall, 1975
4. Lancaster HO: The Chi-Squared Distribution. New York, NY, John Wiley
& Sons Inc, 1969
5. Nesbitt JE: Chi-Square: Statistical Guides in Educational Research, no. 2.
England, Manchester University Press, 1966
6. Noether G: Introduction to Statistics: A Fresh Approach. Boston, MA,
Houghton Mifflin Co, 1971
7. Hays WL: Statistics for the Social Sciences, ed 2. New York, NY, Holt,
Rinehart & Winston General Book, 1973
8. Howell DC: Statistical Methods for Psychology. Boston, MA, Duxbury
Press, 1982
9. Dayton CM: The Design of Educational Experiments. New York, NY,
McGraw-Hill Inc, 1970
10. Dotson CD, Kirkendall DR: Statistics for Physical Education, Health and
Recreation. New York, NY, Harper & Row, Publishers Inc, 1974

C = .35

This would signify a mild positive association.


APPENDIX
Contingency Tables

SUMMARY

We have described three separate uses of the chi-square


distribution: 1) comparing observed and expected frequency
distributions of a nominal variable, 2) testing for the independence of two variables, and 3) using the chi-square test in
determining correlation coefficients. We hope this paper has
helped you gain an understanding of the uses of the chi-square
test and the steps required to calculate this statistic.
PHILIP L. WITT
Assistant Professor
PETER McGRAIN, PhD
Assistant Professor
Div of Physical Therapy
Medical School Wing E 222H
Univ of North Carolina
at Chapel Hill
Chapel Hill, NC 27514

268

To display the results of a study being analyzed by the chi-square


statistic, you construct a contingency table. A contingency table is
set up by rows (r) and columns (c). There is no limit as to how many
rows and columns you can use.2,8
By custom in the contingency table, the rows depict the independent variables and the columns depict the categories of the dependent
variables. The far right column represents the best, highest, or most
positive category of the dependent variable. Each block of the table
is called a cell and is commonly labeled for identification.
Dependent Variable
Group 1

Group 2

column sum

row sum

total

PHYSICAL THERAPY

You might also like