# Nominal vs. Nominal

## Part 3a: Test for association (Pearson chi-square test of independence)

To test if two nominal variables have an association, the most commonly used test is the **Pearson chi-square test of independence**. If the significance of this test is below 0.05, the two nominal variables have a significant association.

One problem though is that the Pearson chi-square test should only be used if not too many cells have a so-called expected count, of less than 5, and the minimum expected count is at least 1. So you will also have to check first if these conditions are met. Most often ‘not too many cells’ is fixed at no more than 20% of the cells. Note that there are othes who would say that all cells should have an expected count of at least 5.

Once you have checked the conditions and looked at the results, you can report the test results. In the example the percentage of cells with an expected count less than 5 is actually 0%, so it is okay to use the test. The test results could than be reported as something like:

Gender and marital status showed to have a significant association, *χ*^{2}(4, *N* = 1941) = 16.99, *p* < .001.

**Click here to see how you can perform the test with SPSS, R (Studio), Excel, Python, a TI-83, or Manually**

**with SPSS**

Click on the thumbnail below to see where you can find each of the values mentioned in the output of the software.

**with R (Studio)**

Click on the thumbnail below to see where you can find each of the values mentioned in the output of the software.

**with Excel**

**with Python**

**with a TI-83**

**Manually (Formulas and example)**

**Formulas**

The formula for the Pearson chi-square value is:

In this formula r is the number of rows, and c the number of columns. Oi,j is the observed frequency of row i and column j. Eij is the expected frequency of row i and column j. The expected frequency can be determined using:

In this formula Ri is the total of all observed frequencies in row i, Cj the total of all observed frequencies in column j, and N the grand total of all observed frequencies. In formula notation:

The degrees of freedom is the number of rows minus one, multiplied by the number of columns minus one. In formula notation:

**Example**

*Note*: different example than in rest of this page.

We are given the following table with observed frequencies.

Brand | Red | Blue |
---|---|---|

Nike |
10 |
8 |

Adidas |
6 |
4 |

Puma |
14 |
8 |

There are three rows, so r = 3, and two columns, so c = 2. Then we can determine the row totals:

The column totals:

The grand total, all three formulas will give the same result:

We can add the totals to our table:

Brand | Red | Blue | Total |
---|---|---|---|

Nike |
10 |
8 |
18 |

Adidas |
6 |
4 |
10 |

Puma |
14 |
8 |
22 |

Total | 30 |
20 |
50 |

Next we calculate the expected frequencies for each cell:

An overview of these in a table might be helpful:

Brand | Red | Blue | Total |
---|---|---|---|

Nike |
10.8 |
7.2 |
18 |

Adidas |
6 |
4 |
10 |

Puma |
13.2 |
8.8 |
22 |

Total | 30 |
20 |
50 |

Note that the totals remain the same. Now for the chi-square value. For each cell we need to determine:

So again six times:

Then the chi-square value is the sum of all of these:

The degrees of freedom is:

To determine the signficance you then need to determine the area under the chi-square distribution curve, in formula notation:

This is usually done with the aid of either a distribution table, or some software.

You might now also wonder what then the association is (which marital status is differently chosen by men and women). This will be the topic on the next page.

FAQ's: (click on the question to see the answer).

## What if I do not meet the conditions?

If your data does not meet the two criteria, all is not lost. You could perhaps combine some categories that have a low count (e.g. combine all marital status that are not married into one), or you can perform a so-called **Fisher exact test**. Click the button below on how to perform a Fisher exact test

## with SPSS

With a Fisher exact test we only need to check the significance, and the interpertation goes similar to that of the Chi-square test. In the report this might go something like:

a two-sided Fisher exact test showed that gender and marital status have a significant association (*N *= 1941, *p* < .001).

## What are these 'expected values'?

The expected values are the number of respondents you would expect if the two variables would be independent.

If for example I had 50 male and 50 female respondents, and 50 agreed with a statement and 50 disagreed with the statement, the expected value for each combination (male-agree, female-agree, male-disagree, and female-disagree) would be 25.

Note that if in the survey the real results would be that all male disagreed, and all female would agree, there is a full dependency (i.e. gender fully decides if you agree or disagree), even though the row and column totals would still be 50. In essence the Pearson chi-square test, checks if your data is more toward the expected values (independence) or the full dependency one.

## Who came up with this stuff?

The Pearson chi-square test is named after Karl Pearson, who described the test in 1900.

The condition of at most 20% is often attributed to Cochran (1954, p. 420), but it was Fisher (1925, p. 83) who was more strict in not allowing any cells with an expected count of less than 5.

The Fisher exact test is named after Ronald Aylmer Fisher who described the test in 1925.

## Are there any alternatives or variations?

The Pearson chi-square test and the Fisher exact test are probably the two most frequently used tests in this situation, however other tests also exist which some claim to perform even better. An alternative for the Pearson chi-square test is the **G-test** (also known as a **likelihood ratio test**), and for the Fisher exact test, the **Barnard test** and the **Boschloo test**.

Another option worth mentioning is that for a chi-square test (such as Pearson and the G-test) some corrections have been suggested, these include the **Yates correction** (Yates, 1934), the **Williams correction** (Williams, 1976), and the **E.S. Pearson correction** (Pearson, 1947).

## How do you get that chi symbol (*χ*) in Word?

Type in the letter 'c', then select it and change the font to 'Symbol'

**Two nominal variables**

Google adds