Module stikpetP.effect_sizes.eff_size_cohen_d
Expand source code
import pandas as pd
def es_cohen_d(nomField, scaleField, categories=None):
'''
Cohen d
-------
An effect size measure for a one-way ANOVA. It simply compares the largest possible difference between two categories means and divides this over the total variance.
Note that most often Cohen d is reported with pairwise tests, but that is actually Cohen d<sub>z</sub>. That version is available using es_cohen_d_ps().
Parameters
----------
nomField : pandas series
data with categories
scaleField : pandas series
data with the scores
categories : list or dictionary, optional
the categories to use from catField
Returns
-------
d : float
the Cohen d value
Notes
-----
The formula used (Cohen, 1988, p. 276):
$$d = \\frac{\\bar{x}_{max} - \\bar{x}_{min}}{\\sigma}$$
With:
$$\\sigma = \\sqrt{\\frac{SS_w}{n}}$$
$$SS_w = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}_j\\right)^2$$
$$\\bar{x}_{max} = \\max\\left(\\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k\\right)$$
$$\\bar{x}_{min} = \\min\\left(\\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k\\right)$$
$$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
*Symbols used:*
* \\(x_{i,j}\\), the i-th score in category j
* \\(n\\), the total sample size
* \\(n_j\\), the number of scores in category j
* \\(k\\), the number of categories
* \\(\\bar{x}_j\\), the mean of the scores in category j
* \\(SS_w\\), the sum of squares of within = error (the variability within the groups)
References
----------
Cohen, J. (1988). *Statistical power analysis for the behavioral sciences* (2nd ed.). L. Erlbaum Associates.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
if type(nomField) == list:
nomField = pd.Series(nomField)
if type(scaleField) == list:
scaleField = pd.Series(scaleField)
data = pd.concat([nomField, scaleField], axis=1)
data.columns = ["category", "score"]
#remove unused categories
if categories is not None:
data = data[data.category.isin(categories)]
#Remove rows with missing values and reset index
data = data.dropna()
data.reset_index()
#overall n, mean and ss
n = len(data["category"])
m = data.score.mean()
sst = data.score.var()*(n-1)
#sample sizes, and means per category
nj = data.groupby('category').count()
sj = data.groupby('category').sum()
mj = data.groupby('category').mean()
#number of categories
k = len(mj)
ssb = (nj*(mj-m)**2)['score'].sum()
ssw = sst - ssb
s = (ssw/n)**0.5
d = ((mj.max() - mj.min())/s).iloc[0]
return d
Functions
def es_cohen_d(nomField, scaleField, categories=None)-
Cohen D
An effect size measure for a one-way ANOVA. It simply compares the largest possible difference between two categories means and divides this over the total variance.
Note that most often Cohen d is reported with pairwise tests, but that is actually Cohen dz. That version is available using es_cohen_d_ps().
Parameters
nomField:pandas series- data with categories
scaleField:pandas series- data with the scores
categories:listordictionary, optional- the categories to use from catField
Returns
d:float- the Cohen d value
Notes
The formula used (Cohen, 1988, p. 276): d = \frac{\bar{x}_{max} - \bar{x}_{min}}{\sigma}
With: \sigma = \sqrt{\frac{SS_w}{n}} SS_w = \sum_{j=1}^k \sum_{i=1}^{n_j} \left(x_{i,j} - \bar{x}_j\right)^2 \bar{x}_{max} = \max\left(\bar{x}_1, \bar{x}_2, \dots, \bar{x}_k\right) \bar{x}_{min} = \min\left(\bar{x}_1, \bar{x}_2, \dots, \bar{x}_k\right) \bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j}
Symbols used:
- x_{i,j}, the i-th score in category j
- n, the total sample size
- n_j, the number of scores in category j
- k, the number of categories
- \bar{x}_j, the mean of the scores in category j
- SS_w, the sum of squares of within = error (the variability within the groups)
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). L. Erlbaum Associates.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def es_cohen_d(nomField, scaleField, categories=None): ''' Cohen d ------- An effect size measure for a one-way ANOVA. It simply compares the largest possible difference between two categories means and divides this over the total variance. Note that most often Cohen d is reported with pairwise tests, but that is actually Cohen d<sub>z</sub>. That version is available using es_cohen_d_ps(). Parameters ---------- nomField : pandas series data with categories scaleField : pandas series data with the scores categories : list or dictionary, optional the categories to use from catField Returns ------- d : float the Cohen d value Notes ----- The formula used (Cohen, 1988, p. 276): $$d = \\frac{\\bar{x}_{max} - \\bar{x}_{min}}{\\sigma}$$ With: $$\\sigma = \\sqrt{\\frac{SS_w}{n}}$$ $$SS_w = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}_j\\right)^2$$ $$\\bar{x}_{max} = \\max\\left(\\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k\\right)$$ $$\\bar{x}_{min} = \\min\\left(\\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k\\right)$$ $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$ *Symbols used:* * \\(x_{i,j}\\), the i-th score in category j * \\(n\\), the total sample size * \\(n_j\\), the number of scores in category j * \\(k\\), the number of categories * \\(\\bar{x}_j\\), the mean of the scores in category j * \\(SS_w\\), the sum of squares of within = error (the variability within the groups) References ---------- Cohen, J. (1988). *Statistical power analysis for the behavioral sciences* (2nd ed.). L. Erlbaum Associates. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' if type(nomField) == list: nomField = pd.Series(nomField) if type(scaleField) == list: scaleField = pd.Series(scaleField) data = pd.concat([nomField, scaleField], axis=1) data.columns = ["category", "score"] #remove unused categories if categories is not None: data = data[data.category.isin(categories)] #Remove rows with missing values and reset index data = data.dropna() data.reset_index() #overall n, mean and ss n = len(data["category"]) m = data.score.mean() sst = data.score.var()*(n-1) #sample sizes, and means per category nj = data.groupby('category').count() sj = data.groupby('category').sum() mj = data.groupby('category').mean() #number of categories k = len(mj) ssb = (nj*(mj-m)**2)['score'].sum() ssw = sst - ssb s = (ssw/n)**0.5 d = ((mj.max() - mj.min())/s).iloc[0] return d