Analysing a single nominal variable
Impression of the data (frequency table)
(if you prefer to watch a video on this than read, click here)
To begin with analysing a single nominal variable, a good starting point can be to generate a frequency table. A frequency table is defined as "a table showing (1) all of the values for a variable in a dataset, and (2) the frequency of each of those responses. Some frequency tables also show a cumulative frequency and proportions of responses" (Warne, 2017, p. 512). An example is shown in Table 1.
Frequency | Percent | Valid Percent | Cumulative Percent | ||
---|---|---|---|---|---|
Valid | Married | 972 |
49.2 |
50.1 |
50.1 |
Widowed | 181 |
9.2 |
9.3 |
59.4 |
|
Divorced | 314 |
15.9 |
16.2 |
75.6 |
|
Separated | 79 |
4.0 |
4.1 |
79.6 |
|
Never married | 395 |
20.0 |
20.4 |
100.0 |
|
Subtotal | 1941 |
98.3 |
100.0 |
||
Missing | No answer | 33 |
1.7 |
||
Subtotal | 33 |
1.7 |
|||
Total | 1974 |
100.0 |
Click here to see how to create a frequency table
with Excel
Excel file from video available here.
with Python
Jupyter Notebook used in video: here.
Data file used in video and notebook GSS2012a.csv.
with SPSS
There are a three different ways to create a frequency table with SPSS.
using Frequencies
watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).
Datafile used in video: Holiday Fair.sav
using Custom Tables
watch the video below, or download the pdf instructions for versions before 24, or version 24 (via bitly, opens in new window/tab)
Datafile used in video: Holiday Fair.sav
using descriptive shortcut
watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).
Datafile used in video: StudentStatistics.sav
The first column contains the various possible options, split into Valid and Missing. Valid are those who answered this question, missing are those who didn’t. At Missing sometimes differences are made between for example 'did not answer', 'incorrectly answered', 'had to skip', etc. If you made a typing error when entering the data it would easily be visible here. If for example you typed 'diforced' once when entering the data (or a value that does not exist) it would appear here as well with a frequency of only 1.
The column Frequency shows how many respondents answered each option. We can tell that 972 people in this survey reported to be married, and oddly enough 22 people don’t know. This is also known as the absolute frequency and defined as “the number of occurrences of a particular phenomenon” (Zedeck, 2014, p. 144). Notice that by far 'Married' is most often selected, and 'Separated' the least.
The Percent column shows the percentages, based on the grand total, so including the missing values. The 49,2 indicates that 49.2% of all respondents were married (you can check that 972 / 1974 × 100 = 49.2).
The Valid Percent shows the percentage, based on the valid total, so excluding the missing values. The 50.1 indicates that 50.1% of all of those who answered this question reported to be married. The Valid Percent is usually the one being reported, and then confusingly enough as simple Percent.
Percentages can be defined as “a way of expressing ratios in terms of whole numbers. A ratio or fraction is converted to a percentage by multiplying by 100 and appending a "percentage sign" %” (Weisstein, 2002, p. 2200).
The Cumulative Percent is the running total of the Valid Percent but for questions on a nominal level this has no meaningful interpretation.
The frequency table is not often reported, since people tend to prefer visualizations. However it is still good to have first created the table, so we can check if the visualization was done properly. How to create a visualisation of this table is discussed in the next part.
FAQ's
Q: What about 'relative frequencies'? (click for the answer)
A: Percentages are actually a type of relative frequencies. Relative frequencies are “[absolute frequency] expressed as a fraction of the total frequency” (Kenney & Keeping, 1954, p. 17). This means that they are the absolute frequency divided by the total frequency. When multiplied by 100 you would get the percentages.
Q: How many decimals should I round values to? (click for answer)
A: For the percentages one or no decimal values is usually recommended (Cole, 2015)
Single nominal variable
Google adds