Frequency Table
A frequency table is defined as "a table showing (1) all of the values for a variable in a dataset, and (2) the frequency of each of those responses. Some frequency tables also show a cumulative frequency and proportions of responses" (Warne, 2017, p. 512). An example is shown in Table 1.
Frequency | Percent | Valid Percent | Cumulative Percent | ||
---|---|---|---|---|---|
Valid | very scientific | 100 |
5.1 |
10.5 |
10.5 |
pretty scientific | 199 |
10.1 |
20.9 |
31.3 |
|
not too scientific | 348 |
17.6 |
36.5 |
67.8 |
|
not scientific at all | 307 |
15.6 |
32.2 |
100 |
|
Subtotal | 954 |
48.3 |
100.0 |
||
Missing | No answer | 1020 |
51.7 |
||
Subtotal | 1020 |
51.7 |
|||
Total | 1974 |
100.0 |
Click here to see how to create a frequency table with Excel, Python, R, or SPSS.
with Python
Jupyter Notebook of video is available here.
with stikpetP library
without stikpetP library
with R (Studio)
with stikpetP library
Jupyter Notebook of video is available here.
without stikpetR library
R script of video is available here.
Datafile used in video: GSS2012-Adjusted.sav
with SPSS
There are a three different ways to create a frequency table with SPSS.
An SPSS workbook with instructions of the first two can be found here.
using Frequencies
watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).
Datafile used in video: Holiday Fair.sav
using Custom Tables
watch the video below, or download the pdf instructions for versions before 24, or version 24 (via bitly, opens in new window/tab)
Datafile used in video: Holiday Fair.sav
using descriptive shortcut
watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).
Datafile used in video: StudentStatistics.sav
A frequency table can help to get impression of your survey data of a binary, nominal, or ordinal variable. It could also help with a scale variable, provided there are not too many options. If, for example, you have asked for age, a list going from 1 to 90 with different ages and frequencies, will probably not be so helpful.
If you have many options in the scale variable, the data is often binned (e.g. 0 < 10, 10 < 20, etc.), which creates then an ordinal variable, of which a frequency table can then be helpful. See binning for more information on this.
A frequency table can show different types of frequencies. Various options are discussed in the different sections below (click on each to reveal it's content).
(Absolute) Frequency
The column Frequency shows how many respondents answered each option. We can tell that 100 people in this survey chose the option 'very scientific'. This is also known as the absolute frequency and defined as “the number of occurrences of a particular phenomenon” (Zedeck, “Frequency”, 2014, p. 144).
(Valid) Percent and Relative
The Percent column shows the percentages, based on the grand total, so including the missing values. The 5.1 indicates that 5.1% of all respondents chose the 'very scientific' option (you can check that 100 / 1974 x 100 ≈ 5.1).
Percentages can be defined as “a way of expressing ratios in terms of whole numbers. A ratio or fraction is converted to a percentage by multiplying by 100 and appending a "percentage sign" %” (Weisstein, 2002, p. 2200).
The Valid Percent shows the percentage, based on the valid total, so excluding the missing values. The 10.5 indicates that 10.5% of all of those who answered this question chose the 'very scientific' option. Most often the ‘Percent’ shown in reports are actually Valid Percent, but the word ‘Valid’ is then simply left out.
Percentages show the number of cases that could be expected if there would be 100 cases in total, hence per-cent which means 'per 100'. If your sample size is very small, be careful about using percentages. If it is less than 100, it means that you are 'blowing up' your differences, while percentages are more commonly used to 'scale down'.
APA recommends to report percentages with one or no decimals.
The term relative frequency is also sometimes used. This is the frequency divided by the total number of cases. Note that this should then always produce a decimal value between 0 and 1 (inclusive). Multiply this by 100 and you get the percentage, multiply it by 1000 and you get permille (‰), multiply it by 360 and you get the degrees of a circle, etc.
Cumulative (Percent)
The cumulative frequency (not shown in example table) can be defined as: “the total (absolute) frequency up to the upper boundary of that class” (Kenney, 1939, p. 16). This would only be useful if there is an order to the categories, so we can say that for example 299 respondents found accounting pretty scientific or even more. Which is why these cumulative frequencies will not have a meaningful interpretation for a nominal variable (e.g. 28 students study business or less?).
The Cumulative Percent is the running total of the Valid Percent, it is the addition of all previous and the current category’s valid percentages. We can see that 31.3% of the respondents that answered this question though accounting is pretty or very scientific.
Density
click here for a video explanation
When the categories are ranges of values (bins), the frequency density could become helpful. It can be defined as: “the number of occurrences of an event divided by the bin size…” (Zedeck, 2014, pp. 144–145).
In principle it is the frequency divided by the bin size (the upper bound minus the lower bound). It shows how 'dense' that particular category (bin) is. Table 2 shows an example.
Age | Frequency | bin size | Frequency Density |
---|---|---|---|
0 < 10 | 15 | 10 – 0 = 10 | 15 / 10 = 1.5 |
10 < 15 | 23 | 15 – 10 = 5 | 23 / 5 = 4.6 |
15 < 25 | 22 | 25 – 15 = 10 | 22 / 10 = 2.2 |
25 < 50 | 40 | 50 – 25 = 25 | 40 / 25 = 1.6 |
50 < 100 | 4 | 100 – 50 = 50 | 5 / 50 = 0.1 |
Note that if all the bins are the same size, there is not much point in determining the frequency density, since you'll be dividing each frequency by the same value.
Instead of dividing each frequency by the bin size, you can also set a standard bin width, and divide by how many times the bin size fits that standard.
As for the relative frequency density, two variations with the same results can be used. The first is by dividing the frequency density by the total (Haighton, Haworth, & Wake, 2003, p. 74), the second would be to divide the relative frequency by the bin size (Kozak, Kozak, Staudhammer, & Watts, 2008, p. 80).
The binning itself is often done with a scale variable, since the frequency table would otherwise often be too long to give a good overview. See binning for more information on how to actually create bins from a scale variable.
Cumulative frequency densities are not often used and even argued to be pointless to calculate (Petry & Friesen, 2012)
If you have open ended bins (e.g. ‘below 20’, ‘65+’) you cannot determine the bin size, and therefore also not the frequency density.
Obtaining the Frequency Density
with Excel
Excel file from video: IM - Frequency Density (E).xlsm
using stikpetE
without using stikpetE
with Python
Notebook from video: IM - Frequency Density (P).ipynb
using stikpetP
without using stikpetP
with SPSS (somewhat)
Formula
The formula for the Frequency Density is:
\(FD_i = \frac{F_i}{CW_i}\)
With:
\(CW_i = UB_i - LB_i\)
\(F_i\) is absolute frequency of category \(i\), \(CW_i\) the class-width, and \(LB_i\) the lower bound, and \(UB_i\) the upper bound.
The relative frequency density can be obtained using:
\(RFD_i = \frac{FD_i}{n} = \frac{RF_i}{CW_i}\)
With:
\(RF_i = \frac{F_i}{n}\)
\(n\) is the sample size, i.e. \(n = \sum_{i = 1}^k F_i\), where \(k\) is the number of categories. \(RF_i\) is the relative frequency of category \(i\).
Google adds