Analysing an ordinal and a scale variable
Part 3: Test and effect size for association
On the previous page we got a first impression from the sample data, and noticed there might be a relation between the ability of the teacher to motivate the students, and the grade the students gave the course. There are many different measures that could be argued for to be used to test if this relation also exists in the population, but I'll suggest to use Spearman's rho (ρ) (or Spearman rank correlation coefficient) (Spearman, 1904).
Spearman rho varies between -1 and +1. If it is -1 there is a perfect negative monotonic relationship, if it is 0 there is no monotonic relationship and at +1 there is a perfect positive monotonic relationship. Monotonic means that it is constantly increasing or constantly decreasing. A positive relation means that if one variable goes up, the other also goes up (for example number of ice cream sold versus temperature), a negative relation indicates if one goes down, the other goes up (for example number of winter jackets sold versus temperature).
We can test if Spearman rho might be significantly different from 0 in the population. In the example the significance is 0.000, which is the chance of having a sample with a Spearman’s rho value of 0.787 or even higher, if in the population it would be 0. Since this chance is so low (below 0.050), we can conclude that most likely in the population Spearman’s rho will be significantly different from 0.
As mentioned before a positive Spearman’s rho (i.e. above 0) indicates a positive relation, which means that if someone scores high on one variable, s/he will most likely also score high on the other. A negative Spearman’s rho (i.e. below 0) indicates that if someone scores high on one variable, s/he will most likely score low on the other.
WARNING: It is important to check how the ordinal variable was coded. An example to illustrate this. Let’s say we have it coded as 1 = very good to 5 = very bad. The Spearman’s rho was -0.73 between this variable and a scale variable. Now a negative Spearman’s rho, means a negative association, so scoring high on one variable, means scoring low on the other. This means the more someone agreed with the statement, the higher on the scale variable. If the ordinal variable was coded as 1 = very bad to 5 = very good, then the more someone agreed with the statement, the lower on the scale variable.
In this example it was coded as 1 = fully disagree to 5 = fully agree, so the positive association is indeed a positive one.
To determine the strength we only look at the absolute value (which means to ignore any minus sign, so the absolute value of for example -0.4 is simply 0.4).
Unfortunately there is no formal way to determine if 0.787 is high or low (although almost everyone would agree this is pretty high), and the rules of thumb floating around on the internet vary quite a lot, often depending on the field (e.g. biology, medicine, business, etc.). For example the same rule of thumbfrom Rea and Parker (1992):
0.00 < 0.10 - Negligible
0.10 < 0.20 - Weak
0.20 < 0.40 - Moderate
0.40 < 0.60 - Relatively strong
0.60 < 0.80 - Strong
0.80 < 1.00 - Very strong
In this example we can therefor speak of a strong effect size. We can add this to our report:
The Spearman rho showed that there was a significant strong association between how motivational the teacher was, and the grade the students gave to the course, rs = .787, p < .001.
Click here to see how you can obtain Spearman rho and the significance, with SPSS, R (studio), Excel, Python, or Manually.
with Excel
with Python
Manually (formulas and example)
Formulas
Spearman's rho is the Pearson correlation using the ranks. There are a few variations on the formula but all should have the same result. One formula is:
In this formula is the i-th rank score of variable x,
the average of the rank scores of variable x,
the sum of squared deviations from the mean rank of variable x.
And the same for the y variable:
In this formula is the number of cases in variable x.
The mean of the ranks are calculated using:
The test-statistic is a t-value, determined by:
And with degrees of freedom of:
Example
Note different from the one used in the rest of this section.
We are given the scores of five people on two variables:
Note that the first person therefor scored a 1 on X and 8 on Y. For each person we have a score on both X and Y.
Each variable has five scores, so:
First we determine the average rank for each variable. Lets begin with X. The lowest value is 1, so this gets a rank of 1. Then there three 2's, so these should get ranks 2, 3, and 4, which on average is rank 3. Then there is one 5, which gets rank 5. So substituting the ranks for the scores in X gives:
For Y each score is unique. The lowest score is a 6, so this gets a rank of 1, then the 7 a rank of 2, the 8 a rank of 3, the 9 a rank of 4 and the 10 a rank of 5. Substituting the ranks for the scores in Y gives:
Next we can determine the average ranks:
Then for the sum of squares:
Now for the numerator (the part above in the fraction):
We can now determine Spearman's rho:
Next we can determine the t-value:
Finally the degrees of freedom:
All done, so let's combine all the bits to create a full report on the next page.
Google adds