Analysing two ordinal variables
Part 3: Test and effect size
On the previous page we saw that there appears to be some relation between what students think of the teacher's ability to link theory to practical situations, and his/her ability to motivate the student. However, this was just a sample, so what does that mean for the entire population. There are various tests that could be used, but I will use Goodman-Kruskal gamma (γ) (Goodman & Kruskal, 1954). This will test if a so-called monotonic relationship exists between two ordinal variables. Gamma uses so-called concordent and discordent pairs to check for this.
In the example this test will have a significance of .000, which is the chance of having a sample with a Gamma value of 0.877 or even higher, if in the population it would be 0. Since this chance is so low (below 0.050), we can conclude that most likely in the population Gamma will be significantly different from 0. Or you can say that by knowing the score on one of the variables we can predict the other with 87.7% more accuracy.
Unfortunately there is no formal way to determine if 0.877 is high or low (although almost everyone would agree this is pretty high), and the rules of thumb floating around on the internet vary quite a lot, often depending on the field (e.g. biology, medicine, business, etc.). I will use the rule of thumb from Rea and Parker (1992):
0.00 < 0.10 - Negligible
0.10 < 0.20 - Weak
0.20 < 0.40 - Moderate
0.40 < 0.60 - Relatively strong
0.60 < 0.80 - Strong
0.80 < 1.00 - Very strong
Note that Gamma can range from -1 to +1 and the table above is for the absolute values of Gamma (so ignoring the minus, i.e. a gamma of -0.15 could be interpreted as weak since 0.15 falls in the 0.10 < 0.20 category). Click here to see a table with various other rule of thumbs for the interpretation.
In the example Gamma was 0.877 which would indicate a very strong effect.
A positive Gamma (i.e. above 0) indicates a positive relation, which means that if someone scores high on one variable, s/he will most likely also score high on the other. A negative Gamma (i.e. below 0) indicates that if someone scores high on one variable, s/he will most likely score low on the other. It is important to check how each variable was coded. Most likely they were coded the same, but if they weren’t the interpretation of Gamma is in reverse. An example to illustrate this. Let’s say we have one variable that is coded as 1 = very good to 5 = very bad, and another variable is coded as 1 = very bad to 5 = very good. The Gamma was -0.73 between these two variables. Now a negative gamma, means a negative association, so scoring high on one variable, means scoring low on the other. However since in this example a low score is actually the same as a high score on the other, it is actually a positive association.
In the report we could add:
The Goodman-Kruskal gamma showed that there was a significant high positive association between how motivational the teacher was, and how s/he was able to link the theory to practice, γ =.88, p < .001.
Click here see how to perform the test with SPSS, R (studio), Excel, Python, or Manually.
with R (Studio)
note the video uses a different example than used here.
click on the thumbnail below to see where to look in the output
with Excel
note the video uses a different example than used here.
with Python
Manually (formulas and example)
Goodman-Kruskal gamma
Formulas
The formula for Goodman-Kruskal gamma is:
In this formula P and Q are defined as:
With:
And:
Where is the number of cases that scored h for the first variable, and k for the second.
Example
Note: Different than the one used in the other sections.
The observed values from two variables are shown below:
Oij | Variable2 | ||||
Variable1 | fully disagree | disagree | neutral | agree | fully agree |
fully disagree | 1 | 4 | 9 | 1 | 3 |
disagree | 0 | 1 | 2 | 2 | 1 |
neutral | 2 | 0 | 4 | 1 | 1 |
agree | 1 | 1 | 1 | 1 | 3 |
fully agree | 1 | 0 | 3 | 0 | 2 |
The concordant scores (Cij) are given by the formula:
This means it is the sum of all values in the cells to the upper-left and the lower-right.
For the neutral-neutral this is the sum of the cells highlighted in yellow: 1+4+0+1+1+3+0+2 = 12.
The results for all cells are:
Cij | Variable2 | ||||
Variable1 | fully disagree | disagree | neutral | agree | fully agree |
fully disagree | 23 | 21 | 11 | 7 | 0 |
disagree | 17 | 17 | 13 | 20 | 15 |
neutral | 11 | 11 | 12 | 22 | 20 |
agree | 5 | 8 | 10 | 25 | 27 |
fully agree | 0 | 4 | 10 | 26 | 31 |
The discordant scores (Dij) are given by the formula:
This means it is the sum of all values in the cells to the lower-left and the upper-right. For the neutral-neutral this is the sum of the cells highlighted in green: 1+3+2+1+1+1+1+0 = 10.
The results for all cells are:
Dij | Variable2 | ||||
Variable1 | fully disagree | disagree | neutral | agree | fully agree |
fully disagree | 0 | 4 | 6 | 16 | 20 |
disagree | 17 | 17 | 9 | 16 | 15 |
neutral | 23 | 20 | 10 | 11 | 8 |
agree | 29 | 25 | 10 | 9 | 4 |
fully agree | 35 | 29 | 13 | 8 | 0 |
The calculation of each individual cell for Cij and Dij can be found in this pdf.
Now we can determine:
Pij | Variable2 | ||||
Variable1 | fully disagree | disagree | neutral | agree | fully agree |
fully disagree | 1x23 = 23 | 4x21 = 84 | 9x11 = 99 | 1x7 = 7 | 3x0 = 0 |
disagree | 0x17 = 0 | 1x17 = 17 | 2x13 = 26 | 2x20 = 40 | 1x15 = 15 |
neutral | 2x11 = 22 | 0x11 = 0 | 4x12 = 48 | 1x22 = 22 | 1x20 = 20 |
agree | 1x5 = 5 | 1x8 = 8 | 1x10 = 10 | 1x25 = 25 | 3x27 = 81 |
fully agree | 1x0 = 0 | 0x4 = 0 | 3x10 = 30 | 0x26 = 0 | 2x31 = 62 |
Then we need to sum all these up to get P:
Now the same but with the discordant results:
Qij | Variable2 | ||||
Variable1 | fully disagree | disagree | neutral | agree | fully agree |
fully disagree | 1x0 = 0 | 4x4 = 16 | 9x6 = 54 | 1x16 = 16 | 3x20 = 60 |
disagree | 0x17 = 0 | 1x17 = 17 | 2x9 = 18 | 2x16 = 32 | 1x15 = 15 |
neutral | 2x23 = 46 | 0x20 = 0 | 4x10 = 40 | 1x11 = 11 | 1x8 = 8 |
agree | 1x29 = 29 | 1x25 = 25 | 1x10 = 10 | 1x9 = 9 | 3x4 = 12 |
fully agree | 1x35 = 35 | 0x29 = 0 | 3x13 = 39 | 0x8 = 0 | 2x0 = 0 |
Then we need to sum all these up to get Q:
Finally we can then determine gamma:
The test
Formulas
Unfortunately there seems to be a three variations in the formula for the test statistic of Goodman-Kruskal gamma. The first is:
The second and third at first are the same:
But vary in how they determine the ASE. The software R uses:
While SPSS uses:
n in these formulas is the total sample size, all other variables are the same as in the Goodman-Kruskal gamma section.
Example
Note: Continuation of the Goodman-Kruskal gamma section.
The first formula can be completed almost immediately since we already calculated all needed values, when calculating gamma itself, except for the total sample size:
The we can determine the approximate z-value:
For the second formula, we need to determine for each cell:
The calculations for each cell is shown below.
Variable2 | |||||
Variable1 | fully disagree | disagree | neutral | agree | fully agree |
fully disagree | 1x(492x23-644x0)^2 | 4x(492x21-644x4)^2 | 9x(492x11-644x6)^2 | 1x(492x7-644x16)^2 | 3x(492x0-644x20)^2 |
disagree | 0x(492x17-644x17)^2 | 1x(492x17-644x17)^2 | 2x(492x13-644x9)^2 | 2x(492x20-644x16)^2 | 1x(492x15-644x15)^2 |
neutral | 2x(492x11-644x23)^2 | 0x(492x11-644x20)^2 | 4x(492x12-644x10)^2 | 1x(492x22-644x11)^2 | 1x(492x20-644x8)^2 |
agree | 1x(492x5-644x29)^2 | 1x(492x8-644x25)^2 | 1x(492x10-644x10)^2 | 1x(492x25-644x9)^2 | 3x(492x27-644x4)^2 |
fully agree | 1x(492x0-644x35)^2 | 0x(492x4-644x29)^2 | 3x(492x10-644x13)^2 | 0x(492x26-644x8)^2 | 2x(492x31-644x0)^2 |
Which has the following results:
Variable2 | |||||
Variable1 | fully disagree | disagree | neutral | agree | fully agree |
fully disagree | 128051856 | 240622144 | 21566736 | 47059600 | 497683200 |
disagree | 0 | 6677056 | 720000 | 430592 | 5198400 |
neutral | 176720000 | 0 | 1149184 | 13987600 | 21977344 |
agree | 262958656 | 147962896 | 2310400 | 42302016 | 343983792 |
fully agree | 508051600 | 0 | 35748912 | 0 | 465247008 |
Now we can determine:
And use this result to calculate:
Which gives a z-statistic of:
For the third formula we first need to determine for each cell:
The calculations for each cell is shown below.
Variable2 | |||||
Variable1 | fully disagree | disagree | neutral | agree | fully agree |
fully disagree | 1x(23-0)^2 | 4x(21-4)^2 | 9x(11-6)^2 | 1x(7-16)^2 | 3x(0-20)^2 |
disagree | 0x(17-17)^2 | 1x(17-17)^2 | 2x(13-9)^2 | 2x(20-16)^2 | 1x(15-15)^2 |
neutral | 2x(11-23)^2 | 0x(11-20)^2 | 4x(12-10)^2 | 1x(22-11)^2 | 1x(20-8)^2 |
agree | 1x(5-29)^2 | 1x(8-25)^2 | 1x(10-10)^2 | 1x(25-9)^2 | 3x(27-4)^2 |
fully agree | 1x(0-35)^2 | 0x(4-29)^2 | 3x(10-13)^2 | 0x(26-8)^2 | 2x(31-0)^2 |
Which has the following results:
Variable2 | |||||
Variable1 | fully disagree | disagree | neutral | agree | fully agree |
fully disagree | 529 | 1156 | 225 | 81 | 1200 |
disagree | 0 | 0 | 32 | 32 | 0 |
neutral | 288 | 0 | 16 | 121 | 144 |
agree | 576 | 289 | 0 | 256 | 1587 |
fully agree | 1225 | 0 | 27 | 0 | 1922 |
Now we can determine:
Using this in the formula for ASE:
Which gives a z-statistic of:
We can complete the report now by combining all the parts, which will be shown on the next page.
FAQ:
Q: Why did you choose gamma?
A: Since this site focusses on survey data, and often the ordinal variable is based on a 5 point scale (e.g. fully agree to fully disagree) there is a risk of having many ties. As suggested on the bottom of the page here, with many ties Goodman-Kruskal gamma is prefered above
Google adds