Analysing a single nominal variable

Part 3b: Which percentages are different? (post-hoc pairwise binomial test)

Note: click here if you prefer to watch a video

In the previous part we saw that the percentages of the marital status were not equal across the different options. We would now also like to know which categories are then different:

Is the percentage of Married significantly different from Widowed?
Is the percentage of Married significantly different from Divorced? etc.

To check each of these we can use three different tests: (1) a chi-square goodness of fit test again, but now only use the two chosen categories, or we can use a so-called (2) z-test (or t-test) for proportions, or a so-called (3) binomial test (Glenn_b, 2015). I will use this last one, the binomial test. We choose two categories and compare the percentages of only those two categories with each other. As with the Pearson chi-square test of goodness of fit, we need an assumption about the population. For this binomial test we assume the two categories would have an equal percentage in the population (all people who would fit in the two categories). This would be 50/50 so 50%, or since statisticians prefer proportions, a probability of 0.5. The binomial test then determines exactly the chance of having the result we have in the sample if indeed in the population the chance for each of the two categories is 50%. The exact formula is shown in the Appendix below. As with our previous test if this chance is very low (below .05) we would assume the assumption about the population (in this case that the chance is 50%) is incorrect.

Note that we have to perform 10 tests in this example to compare all possible pairs. If we use the usual 5% risk for each test, we have a very high chance (it is actually 40%) that for at least one of them we make the wrong decision. To compensate for this many different methods have been suggested, but one that is relatively straight forward is the Bonferroni procedure (Bonferroni, 1935). He simply suggested to divide the 5% by the number of tests that are being done, and use that then as the criteria. In this example that would mean we divide 5% by 10 and the new threshold will be 0.5% (i.e. 0.005). In general if you have c categories, the number of pairs you can create is c x (c - 1)/ 2 (in this example 5 x (5 - 1)/2 = 10).

In this example the highest significance among all 10 possible tests will be 0.003, which is still lower than 0.005. So even after the Bonferroni correction, all percentages were significantly different from each other. We could report this for example as follows:

The binomial test pairwise comparison with Bonferroni correction of marital status showed that all proportions were significantly different from each other (p < .003).

Click here to see how to perform a pairwise binomial test

with Excel

Excel file from video: PH - Pairwise Binomial.xlsm.

with Python

Jupyter Notebook from video: PH - Pairwise Binomial.ipynb.

Data file from video: GSS2012a.csv.

with R (Studio)

R script from video: PH - Pairwise Binomial.R.

Data file from video: Pearson Chi-square independence.csv.

Jupyter Notebook: PH - Binomial Pairwise (R).ipynb.

with SPSS

Data file used in video: GSS2012-Adjusted.sav.

In some cases, because of the adjustment even though the omnibus test indicated that there are significant differences, none of the pairs in the post-hoc test has a significant difference. We can then only conclude that not all percentages will be equal in the population, but unfortunately cannot pinpoint exactly which ones.

We are almost done with our analyses. One last thing is that if you have a very large sample size, everything becomes significant. The question then becomes if the the difference is really relevant. To have some indication for this an effect size is often also added, which we will discuss in the next part.

Sometimes a visualisation is also added of the post-hoc analysis, in a so-called a hanging chi-gram (a suspended hanging rootogram) (Rice, 2006, p. 352). This is a bar-chart of the standardized residuals. The standardized residuals is the difference between the observed count and the expected count, then divided by the square root of the expected count.

Single nominal variable

Google adds