Analysing a binary vs. scale variable

Effect size: Cohen's ds

When there is a significant difference, we might also want to check the ‘size’ of the difference. For example if we would have had a difference of 0.0003 in grades, then with extreme large sample sizes this could still be significant, but not really relevant. To measure the size of the difference we would need a so-called effect size

An appropriate effect size in case of a binary and scale variable is Cohen’s d_s (Cohen, 1988), although Hedges g (Hedges, 1981) might be preferred in case you have less than 20 respondents (Lakens, 2013).

Cohen’s d_s divides the difference of the two means, by the so-called pooled standard deviation (Cohen, 1988, pp. 66-67). In the example this results in a Cohen’s d_s of 0.28. Cohen gives a rule of thumb for the interpretation, shown in Table 1.

Table 1
*Interpretation for Cohen’s ds*
Cohen’s d	Interpretation
0.00 < 0.20	very small
0.20 < 0.50	small
0.50 < 0.80	medium
0.80 or more	large

Click here to see how to determine Cohen's d with SPSS, R (studio), Python an Online Calculator, or Manually.

with SPSS

version 27+

versions prior to 27

Unfortunately versions prior to 27 do not have an option in the GUI to determine Cohen's d. However you can either use the output from the independent samples t-test and enter the results in the online calculator (see below), or use SPSS syntax. The video will show each option

with R (studio)

with Python

Online calculator

Enter the requested information below:

Manually

Cohen's d_s formula is:

$d_{s}=\frac{\bar{x}_{1}-\bar{x}_{1}}{s_{pooled}}$

In this formula $\bar{x}_{i}$ is the mean for category i, which can be calculated by:

$\bar{x}_{i}=\frac{\sum_{j=1}^{n_{i}}x_{i,j}}{n_{i}}$

In this formula x_i,j is the j-th score in category i, and n_i is the number of cases in category i

s_pooled is the pooled standard deviation. In formula notation this is:

$s_{pooled}=\sqrt\frac{SS_{1}+SS_{2}}{N-2}$

In this formula N is the total number of cases (combination of both categories), and SS_i the sum of squared differences with the mean, which in formula notation is:

$SS_{i}=\sum_{j=1}^{n_{i}}\left ( x_{i,j}-\bar_{x}_i \right )^{2}$

An example.
Note: a different example than the one used in the rest of this section, to keep calculations a bit shorter.

Given are the scores of males (category 1) and females (category 2):

$X_{1}=\left$8,3,2,1,1\right$$

$X_{2}=\left$7,7,5,3,9,8\right$$

The first category, has 5 scores, so n₁= 5, and for the second category we have n₂= 6. Let's begin with determining the mean per category:

$\bar{x}_{1}=\frac{\sum_{j=1}^{n_{1}}x_{1,j}}{n_{1}}=\frac{\sum_{j=1}^{5}x_{1,j}}{5}=\frac{8+3+2+1+1}{5}=\frac{15}{5}=3$

$\bar{x}_{2}=\frac{\sum_{j=1}^{n_{2}}x_{2,j}}{n_{2}}=\frac{\sum_{j=1}^{6}x_{2,j}}{6}=\frac{7+7+5+3+9+8}{6}=\frac{39}{6}=\frac{13}{2}=6.5$

Then we can determine the sum of squares per category. First the male category:

$SS_{1}=\sum_{j=1}^{n_{1}}\left(x_{1,j}-\bar_{x}_1\right)^{2}=\sum_{j=1}^{5}\left(x_{1,j}-3\right)^{2}$

$=\left(8-3\right)^{2}+\left(3-3\right)^{2}+\left(2-3\right)^{2}+\left(1-3\right)^{2}+\left(1-3\right)^{2}$

$=\left(5\right)^{2}+\left(0\right)^{2}+\left(-1\right)^{2}+\left(-2\right)^{2}+\left(-2\right)^{2}=25+0+1+4+4=34$

And for the female category:

$SS_2=\sum_{j=1}^{n_2}\left(x_{2,j}-\bar{x}_2\right)^{2}=\sum_{j=1}^{6}\left(x_{2,j}-\frac{13}{2}\right)^{2}$

$=\left(7-\frac{13}{2}\right)^{2}+\left(7-\frac{13}{2}\right)^{2}+\left(5-\frac{13}{2}\right)^{2}+\left(3-\frac{13}{2}\right)^{2}+\left(9-\frac{13}{2}\right)^{2}+\left(8-\frac{13}{2}\right)^{2}$

$=\left(\frac{14}{2}-\frac{13}{2}\right)^{2}+\left(\frac{14}{2}-\frac{13}{2}\right)^{2}+\left(\frac{10}{2}-\frac{13}{2}\right)^{2}+\left(\frac{6}{2}-\frac{13}{2}\right)^{2}+\left(\frac{18}{2}-\frac{13}{2}\right)^{2}+\left(\frac{16}{2}-\frac{13}{2}\right)^{2}$

$=\left(\frac{14-13}{2}\right)^{2}+\left(\frac{14-13}{2}\right)^{2}+\left(\frac{10-13}{2}\right)^{2}+\left(\frac{6-13}{2}\right)^{2}+\left(\frac{18-13}{2}\right)^{2}+\left(\frac{16-13}{2}\right)^{2}$

$=\left(\frac{1}{2}\right)^{2}+\left(\frac{1}{2}\right)^{2}+\left(\frac{-3}{2}\right)^{2}+\left(\frac{-7}{2}\right)^{2}+\left(\frac{5}{2}\right)^{2}+\left(\frac{3}{2}\right)^{2}$

$=\frac{1}{4}+\frac{1}{4}+\frac{9}{4}+\frac{49}{4}+\frac{25}{4}+\frac{9}{4} =\frac{1+1+9+49+25+9}{4}$

$=\frac{94}{4}=\frac{47}{2}=23.5$

The pooled standard deviation therefor is:

$s_{pooled}=\sqrt\frac{SS_{1}+SS_{2}}{N-2}=\sqrt\frac{34+\frac{47}{2}}{11-2}$

$=\sqrt\frac{\frac{68}{2}+\frac{47}{2}}{9} =\sqrt\frac{\frac{68+47}{2}}{9} =\sqrt\frac{\frac{115}{2}}{9} =\sqrt\frac{115}{2\times9}$

$=\sqrt\frac{115}{18} =\frac{1}{18}\sqrt{115\times18} =\frac{1}{18}\sqrt{115\times2\times9} =\frac{3}{18}\sqrt{115\times2}$

And finally Cohen's d_s: $=\frac{3}{18}\sqrt{115\times2} =\frac{1}{6}\sqrt{230}\approx2.53$

$d_s=\frac{\bar{x}_{1}-\bar{x}_{1}}{s_{pooled}} =\frac{3-\frac{13}{2}}{\frac{1}{6}\sqrt{230}} =\frac{\frac{6}{2}-\frac{13}{2}}{\frac{1}{6}\sqrt{230}} =\frac{\frac{6-13}{2}}{\frac{\sqrt{230}}{6}}$

$=\frac{\frac{6-13}{2}}{\frac{\sqrt{230}}{6}} =\frac{\frac{-7}{2}}{\frac{\sqrt{230}}{6}} =\frac{-7\times6}{2\times\sqrt{230}} =\frac{-7\times3}{\sqrt{230}} =\frac{-21}{\sqrt{230}}$

$=\frac{-21\times\sqrt{230}}{230} =-\frac{21}{230}\times\sqrt{230}\approx-1.38$

The .28 from the example, would suggest a small effect.

Binary vs Scale

Reporting

Google adds