Nominal vs. Nominal
Part 2: Visualisation (clustered bar-chart)
On the previous page we got the first impression of the data using a cross table. On this page we'll see how we can visualise those results.
Two possible diagrams could be used, depending on your situation. If you have a clear dependent and independent variable, then a so-called clustered bar-chart might be useful, if not then a spine plot could be a good choice.
An independent variable vs. dependent variable is if you think one variable might influence the other, but not the other way around. Gender is often a good example of an independent variable, since your gender will not likely change depending on something else (unless you're doing biology and are using chromosones which determine the gender of a baby). In the example used, the gender might influence the marital status, while the marital status will not influence the gender. I would therefor use a clustered bar-chart (also known as a multiple bar-chart) for the example as shown in Figure 1.
Figure 1. Results of gender vs marital status.
Click here to see how you can create a clustered bar-chart ...
Excel file: VI - Bar Chart - Clustered.xlsm
Note that in the example the column totals add up to 100% each, which makes it easy to compare the results between the two genders. Depending on your results you might prefer to set each row as 100% or even based on the grand total.
We can notice the same things as we saw with the cross table; it seems that most percentages are similar with the biggest difference between Male and Female at married and widowed.
In the report I recommend using a ‘Introduce – Show – Tell’ approach. So when reporting this graph, it could be for example like this:
Often we hear that woman are more likely to be widowed. To see if there is a relation between gender and marital status, we asked people about their marital status and gender. Figure 1 shows the results of the survey.
As can be seen Figure 1 married is still by far the modal category for both males and females. The differences between males and females seem to be small (almost none for divorced), except for widowed where there are relatively many females.
The big question is if the sample shows sufficient evidence that the differences found might also appear in the population. This will be discussed on the next page.
In case you do not have a clear independent and dependent variable, a spine plot might be preferred as the one shown below.
Click here to see how to create a spineplot...
It might take some time, but it is possible to create a spine plot with Excel, as shown in the video below.
Excel file: VI - Spine Plot.xlsm
with SPSS (not possible)
Unfortunately I am not familiar with a way to create a spine plot in SPSS using the GUI. It might be possible by using some syntax (linking perhaps to R), but that goes beyond the scope of this course. I'd suggest using MS Excel instead.
There is a macro from Wheeler that should make it possible: https://andrewpwheeler.com/2013/04/21/spineplots-in-spss/ , but I have not tested this myself.
The naming of this diagram is unfortunately not very clear. I use the term 'spine plot' as a special case of a Mosaic Plot. Mosaic Plots are often attributed to Hartigan and Kleiner (for example by Friendly (2002, p. 90)). Earlier versions are actually known, for example Walker (1874, p. PI XX). Hartigan and Kleiner (1981) start their paper with a Mosaic Plot for a cross table, but end it with showing Mosaic Plots for multiple dimension cross tables.
A Marimekko Chart is simply an alternative name for the Mosaic Plot, although according to Wikipedia "mosaic plots can be colored and shaded according to deviations from independence, whereas Marimekko charts are colored according to the category levels" (Wikipedia, 2022).
The term 'Spine Plot' itself is often attributed to Hummel, but I've been unable to hunt down his original article: Linked bar charts: Analysing categorical data graphically. Computational Statistics 11: 23–33.
Two nominal variables