Analysing a single nominal variable

Center and dispersion (mode and variation ratio)

A diagram can still be drawn to mislead the reader, to avoid this some measurements are sometimes added. Two most commonly used type of measurements are the measures of central tendency and of dispersion (or variability).

Central tendency (mode)

Note: click here if you prefer to watch a video about the mode than read

A measure of central tendency attempts to let one number represent the data as good as possible. The most commonly used measure of central tendency for a nominal variable is the mode. This is defined as “the abscissa corresponding to the ordinate of maximum frequency” (Pearson, 1895, p. 345). A more modern definition would be “the most common value obtained in a set of observations” (Weisstein, 2002). The word mode might even come from the French word 'mode' which means fashion. Fashion is what most people wear, so the mode is the option most people chose.

If one category has the highest frequency this category will be the modal category and if two or more categories have the same highest frequency each of them will be the mode. If there is only one mode the set is sometimes called unimodal, if there are two it is called bimodal, with three trimodal, etc. For two or more, the term multimodal can also be used.

In the example we can have a look at the frequency table of the marital status (Table 1).

Table 1
*Results of Marital Status*
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Married	972	49.2	50.1	50.1
	Widowed	181	9.2	9.3	59.4
	Divorced	314	15.9	16.2	75.6
	Separated	79	4.0	4.1	79.6
	Never married	395	20.0	20.4	100.0
	Subtotal	1941	98.3	100.0
Missing	No answer	33	1.7
	Subtotal	33	1.7
Total		1974	100.0

The category 'Married' was chosen most often (972 times), so this will be the modal category.

One small controversy exists if all categories have the same frequency. In this case none of them has a higher occurence than the others, so none of them would be the mode (see for example Spiegel & Stephens, 2008, p. 64, Larson & Farber, 2014, p. 69). On a rare occasion someone might argue that if all categories have the same frequency, then all categories are part of the mode since they all have the highest frequency. A third interpretation was found on this site, where if the frequency is one for each category, then there is no mode, and if the frequency is more than one but the same, then all of them are the mode.

The mode can easily be spotted from a bar-chart (the category(s) with the highest bar), or a frequency table (the category(s) with the highest frequency). Software packages can also determine the mode, but be careful on how they deal with multiple modes.

Click here to see how to determine the mode

with Excel

Excel file from video: CE - Mode.xlsm.

with Python

Jupyter Notebook from video: CE - Mode.ipynb.

Data used in video: GSS2012a.csv.

with R (Studio)

R script from video: CE - Mode.R.

Data used in video: ModeExamples.sav.

with SPSS

Data used in video: ModeExamples.sav.

Measurement of dispersion (Variation Ratio)

The typical value for a nominal variable is the mode, but besides knowing about the most typical value, another important piece of information is about how much variation there was. This is were measures of variability (or dispersion) come in. Although measures of dispersion are not often reported for nominal variables, there exists I think more different measures of dispersion for nominal variables than any other measurement level. A quick look on Wikipedia shows almost 40 different measures for a single nominal variable.

It will be too much to go over all the different measures and they are not often reported. If you are interested, then an article from Kader and Perry (2007) can be a nice start, it is available here.

The easiest method is most likely the Variation Ratio (VR) (Freeman, 1965). This is simply the proportion that does not belong to the modal category (Zedeck, 2014, p.406). So in the example in Table 1, we can see that 50.1% falls into the modal category of Married, and hence 49.9% does not. The Variation Ratio is therefor 49.9% (or 0.499).

Click here to see how to determine the Variation Ratio

with Excel

Excel file from video: DI - Variation Ratio.xlsm.

with Python

Jupyter Notebookfrom video: DI - Variation Ratio.ipynb.

Data used in video: GSS2012a.csv.

with R (studio)

R script from video: DI - Variation Ratio.R.

Data used in video: ModeExamples.sav.

with SPSS

Unfortunately it is not possible (to my knowledge) to let SPSS determine the Variation Ratio. Luckily the calculation is not too difficult. I'd suggest to create a frequency table with SPSS and then use the online calculator or Excel to determine the Variation Ratio.

Online calculator

Enter the requested information below:

Manually (formula and example)

Formula

\(VR =1-\frac{n_{max}\times F_{max}}{n}\)

Symbols used

VR = Variation Ratio
n_max = the number of times the maximum frequency occurs
F_max = the maximum frequency
n = the total sample size (i.e. the sum of all frequencies.

Example

From Table 1 the highest frequency is 972, so F_max = 972, which only occurs once, so n_max = 1. The total frequency is 1941, so n = 1941. We can fill this out in the formula to obtain:

\(VR=1-\frac{n_{max}\times F_{max}}{n} =1-\frac{1\times972}{1941} =\frac{1941}{1941}-\frac{972}{1941} =\frac{969}{1941}\)

\(=\frac{3\times323}{3\times647} =\frac{323}{647} \approx0.4992\)

Using descriptive statistics (frequency table, the bar-chart and the mode and variation ratio) we have gotten a decent impression of the sample data, but what can it tell us about the population? For that we need to go into the inferential statistics, starting on the next page.

Single nominal variable

Google adds