Analysing a nominal and scale variable
Part 3a: Test for mean differences
From the sample data we would like to know if the differences in means might also appear in the population. In the example used so far we saw that there were differences in averages between the three locations, but this could simply be due to sampling error. The test to see if these differences might also occur in the population is the one-way ANOVA.
If the test results in a significance (p-value) of less than the pre-determined significance level (usually .05), the nominal variable has an effect on the scale variable and most likely the means for one or more categories will be different in the population from one or more other categories.
In the example the significance is 0.001, which is the chance of having a sample with a F value of 8.043 or even higher, if in the population there would be no differences between the three groups. Since this chance is so low (below 0.050), we can conclude that most likely in the population there will be a an influence of the location on the grade students gave.
In formal APA style we can report this result as:
The one-way ANOVA showed that Location had significant effect on how students evaluated the course, F(2, 45) = 8.04, p = .001.
Click here to see how to perform a one-way ANOVA, with SPSS, R (studio), Excel, Python, or Manually.
Two ways to perform a One-way ANOVA with SPSS:
you can either use the Data Analysis add in from Excel to get static results (i.e. changes in the data will not be reflected in the results), or use the Excel functions.
using built in functions
using Data Analysis add-in
Manually (Formulas and example)
The formula for the one-way ANOVA test statistic is:
In this formula MS is short for Mean Square, which is the mean of the squared deviations. There formulas are:
In these formulas, the SS is short for Sum of Squares, which in turn is short for Sum of Squared deviations from the mean, and df is short for degrees of freedom.
The formulas for the degrees of freedom are:
In these formulas k is the number of categories, and n the total sample size.
The formulas for the sum of squares is:
In these formulas is the number of scores in category i, the mean of all scores, the mean of the scores in the i-th category, and the j-th score in the i-th category.
Formulas for the means are:
Where is the i-th score.
Note: different example than used in the rest of this section.
We are given grades people gave to a brand, and grouped the people in three categories (local, regional, outside). The grades were:
The first category has 4 scores, the second 3, and the third 5. Therefore:
Lets begin with the overall mean:
The means for each category are:
Now for the SS within:
Lets do these sums one by one:
Using these three results we can determine the SS within:
Now for the SS between.
Then the degrees of freedom:
We can now determine the Mean Square:
Finally the F-statistic:
The test only shows there is an effect, but does not show which locations are significantly different from each other. To find this out, we should use a so-called post-hoc test, which will be the topic for the next page.
Q: What does ANOVA stand for?
A: The term ANOVA is short for ANalyses Of VAriances. It might be a bit strange to look at variance instead of means, but by comparing different variances, something can be said about the means.
Q: Why is it called one-way ANOVA, is there also a two-way ANOVA?
A: Yes, there is also a two-way ANOVA. The one-way is that we are looking for the influence of one nominal variable on the scale variable, while with a two-way ANOVA you would look for the influence of two nominal variables on one scale variable.
Nominal vs Scale