Analyse a Single Binary Variable

The analysis of a single binary variable can be done with the steps shown below. Click on each step to reveal how this step can be done.

Step 1: Impression

For a quick impression of a binairy variable, you can create a frequency table. The result will be something as shown below.

Table 1
*Results of Gender*
		Frequency	Percent	Valid Percent
Valid	Female	12	22	26
	Male	34	62	74
	Subtotal	46	84	100
Missing	No response	9	16
	Subtotal	9	16
Total		55	100

Click here to see how to create a frequency table with Excel, Python, R, or SPSS.

with Excel

Excel file from videos available here.

with stikpetE add-in

without add-ins

with Python

Jupyter Notebook of video is available here.

with stikpetP library

without stikpetP library

with R (Studio)

with stikpetP library

Jupyter Notebook of video is available here.

without stikpetR library

R script of video is available here.

Datafile used in video: GSS2012-Adjusted.sav

with SPSS

There are a three different ways to create a frequency table with SPSS.

An SPSS workbook with instructions of the first two can be found here.

using Frequencies

watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).

Datafile used in video: Holiday Fair.sav

using Custom Tables

watch the video below, or download the pdf instructions for versions before 24, or version 24 (via bitly, opens in new window/tab)

Datafile used in video: Holiday Fair.sav

using descriptive shortcut

watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).

Datafile used in video: StudentStatistics.sav

See the explanation of frequency table for details on how to read this kind of table.

The table itself might not end up in the report, but gives a quick impression for yourself.

We could add something like the following in the report based on the example:

There appears to be relatively many male workers (N = 34, 74%), compared to the female workers (N = 12, 26%).

Step 2: Testing

With a single binairy variable, you are probably interested to compare the percentages of the two categories. An exact one-sample binomial test, can do this for you. You can check if the two percentages are significantly different, by using the assumption that they are equal in the population. The p-value (or significance) of the test, can then show if this is the case or not.

Click here to see how to perform a biniomial test with Excel, Flowgorithm, Python, R, SPSS, or Manually

with Excel

Excel file from videos TS - Binomial (one-sample) (E).xlsm.

with stikpetE

without stikpetE

with Flowgorithm

A basic implementation for a one-sample binomial test is shown in the flowchart in figure 1

Figure 1
Flowgorithm for one-sample binomial test

It takes as input the frequency of one of the categories (k) and the sample size (n). This makes use of the binomial distribution cumulative density function.

Flowgorithm file: TS - Binomial (one-sample).fprg.

with Python

Jupyter Notebook from videos TS - Binomial Exact Test.ipynb.

with stikpetP

with other libraries

without libraries

or without using any libraries:

Basic code example:


			# libraries needed
			import pandas as pd
			from scipy.stats import binom_test

			# some data
			myDf = pd.read_csv('../../Data/csv/StudentStatistics.csv', sep=';')
			myCd = myDf['Gen_Gender'].value_counts()

			# the test
			binom_test(myCd.values[0], sum(myCd.values), 1/2, alternative='two-sided')

with R (Studio)

with stikpetR

Jupyter Notebook from video TS - Binomial (one-sample) (R).ipynb.

without stikpetR

R script from video TS - Binomial Exact Test.R.

Datafile used in video: StudentStatistics.sav

Basic code example:


			#one sample binomial test

			#Preparation
			#Getting some data
			#install.packages("foreign")
			library(foreign)
			myData <- read.spss("../Data Files/StudentStatistics.sav", to.data.frame = TRUE)

			#Remove na's
			myVar <- na.omit(myData$Gen_Gender)

			#Determine number of successes
			k <- sum(myVar==myVar[1])

			#Determine total sample size
			n <- length(myVar)

			#Test if expected both groups to be equal  
			#Perform binomial test
			binom.test(k,n)

			#Or use binomial distribution directly
			2*pbinom(k,n,.5)

with SPSS

using non-parametric tests

Datafile used in video: StudentStatistics.sav

using Legacy Dialogs

Datafile used in video: StudentStatistics.sav

using compare means

Datafile used in video: StudentStatistics.sav

Manually (Formula's)

A one-sample binomial test, is almost 'just' the same as using the binomial distribution. Given a probability of success (p), which for the binomial test is the expected proportion in the population, the number of trials (n), which for the binomial test is the total sample size, and the number of successes (k), which for the binomial test is number of occurences in one of the categories.
The formula for the cumulative binomial distribution (F(k; n,p)) is:

\(F\left(k;n,p\right)=\sum_{i=0}^{\left\lfloor k\right\rfloor}\binom{n}{i}\times p^{i}\times\left(1-p\right)^{n-i}\)

If p = 0.5 the formula could be simplfied into:

\(F\left(k;n,0.5\right)=0.5^{n}\times\sum_{i=0}^{\left\lfloor k\right\rfloor}\binom{n}{i}\)

In the formula ⌊k⌋ is the 'floor' function. This gives the greatest integer (whole number) less than or equal to k. So for example ⌊2.8⌋ = 2, and ⌊-2.2⌋=-3.

\(\binom{n}{i}\) is the binomial coefficient, this can be calculated using:

\(\binom{n}{i}=\frac{n!}{i!\times\left(n-i\right)!}\)

In this formula the ! indicates the factorial operation:

\(n!=\prod_{i=1}^{n}i\), and 0! is defined as 0! = 1.

These formulas are discussed in more detail in the binomial distribution section

The p-value (sig.) is the probability of the percentages as in the sample, or more extreme, if the assumption about the population (that they are equal) would be true. If this is below the pre-defined threshold (usually .05), we would reject this assumption, and conclude there is a significant difference, otherwise we would not reject the assumption.

When reporting the result of a one-sample binomial test, the only thing to show is the p-value (sig.), so for example:

An exact binomial test indicated that the percentage of female (N_f = 12, 26%), was significantly different from the male percentage (N_m = 34, 76%), p = .002.

Note that the p-value is usually reported with three decimal places. If the p-value is below .0005, it is then reported as p < .001. Note that SPSS used to often show p-values less than .0005 as .000.

Besides the one-sample binomial test, there are other tests that could be used. The binomial distribution can be approximated with a normal distribution, which leads to a one-sample proportion test. Another approach is to use a so-called goodness-of-fit test (either Pearson, or Likelihood Ratio).

Statstest.com recommend to use the exact binomial if the sample size is below 1000, and a Likelihood Ratio-test otherwise with Yates correction. The Likelihood Ratio-test is less known, so if you want a more well known test the Pearson version (with Yates) should be fine as well.

Step 3: Effect Size

Each test should be accompanied by an effect size, according to APA (2019, p. 88). One possible effect size for a one-sample binomial test, where the assumption was that both categories were equal, is Cohen g.

Click here to see how to determine Cohen's g with Excel, Flowgorithm, Python, R, SPSS, an online calculator, or Manually

with Excel

Excel file from video: ES - Cohen g (E).xlsm.

with stikpetE

without stikpetE

with Flowgorithm

A basic implementation for Cohen g is shown in the flowchart in figure 1

Figure 2
Flowgorithm for Cohen g

It takes as input the frequency of one of the categories (k) and the sample size (n).

Flowgorithm file: ES - Cohen g.fprg.

with Python

Jupyter Notebook from video: ES - Cohen g (P).ipynb.

with stikpetP

without stikpetP

with R (Studio)

with stikpetR

Jupyter Notebook from video ES - Cohen g (R).ipynb.

without stikpetR

R script from video: binary - effect sizes.R.

Datafile used in video: StudentStatistics.sav

with SPSS

Datafile used in video: StudentStatistics.sav

Online calculator

Enter the number of cases of the first category, then the total sample size:

Manually (using Formula)

Given a sample proportion (p) and the expected proportion in the population (π), the formula for Cohen's g will be:

\(g=p-\pi\)

The sample proportion in the example was 0.26 and the expected proportion was 0.50, in the example this therefor gives:

\(g=0.26-0.50=-0.24\)

Often the absolute value is used (the so-called nondirectional Cohen's g):

\(g=|0.26-0.50|=|-0.24|=0.24\)

Cohen g is simply the difference between the observed proportion, and 0.5. Cohen gave some rule of thumb to interpret this, shown in Table 1.

Table 1
Rule of thumb for Cohen’s g
\|g\|	Interpretation
0.00 < 0.05	Negligible
0.05 < 0.15	Small
0.15 < 0.25	Medium
0.25 or more	Large
Note: Adapted from Statistical power analysis for the behavioral sciences (2nd ed., pp. 147-149) by J. Cohen, 1988, L. Erlbaum Associates.

The 0.24 would fall in the Medium category (but is very close to the Large). We could add this to our findings:

An exact binomial test indicated that the percentage of female (N_f = 12, 26%), was significantly different from the male percentage (N_m = 34, 76%), p = .002. Cohen’s g suggests that the difference can be classified as medium, g = .24.

Alternatives to Cohen g could be Cohen h' or the Alternative Ratio.

Step 4: Reporting

In each step, we already discussed how it could be reported. For the example used, the final report could have somthing like the following:

There appears to be relatively many male workers (N = 34, 74%), compared to the female workers (N = 12, 26%).

An exact binomial test indicated that the percentages were significantly different, p = .002. Cohen’s g suggests that the difference can be classified as medium, g = .24.

If you want to make things easy for yourself and are using Excel, Python or R, you can use my library/add-on to perform each step.

Using a stikpet Library/Add-On

Excel and the stikpetE add-on

Excel file from video: .

Python and the stikpetP library

Jupyter Notebook from video: stikpetP - Single Binary.ipynb.

R and the stikpetE library

Jupyter Notebook from video: stikpetR - Single Binary.ipynb.

Quick Analysis

Single Variable

binary

nominal

ordinal

Google adds