Module stikpetP.effect_sizes.eff_size_cohen_d

Expand source code
import pandas as pd

def es_cohen_d(nomField, scaleField, categories=None):
    '''
    Cohen d
    -------
    An effect size measure for a one-way ANOVA. It simply compares the largest possible difference between two categories means and divides this over the total variance.
    
    Note that most often Cohen d is reported with pairwise tests, but that is actually Cohen d<sub>z</sub>. That version is available using es_cohen_d_ps().
    
    Parameters
    ----------
    nomField : pandas series
        data with categories
    scaleField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
        
    Returns
    -------
    d : float
        the Cohen d value
    
    Notes
    -----
    The formula used (Cohen, 1988, p. 276):
    $$d = \\frac{\\bar{x}_{max} - \\bar{x}_{min}}{\\sigma}$$
    
    With:
    $$\\sigma = \\sqrt{\\frac{SS_w}{n}}$$
    $$SS_w = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}_j\\right)^2$$
    $$\\bar{x}_{max} = \\max\\left(\\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k\\right)$$
    $$\\bar{x}_{min} = \\min\\left(\\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k\\right)$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\), the i-th score in category j
    * \\(n\\), the total sample size
    * \\(n_j\\), the number of scores in category j
    * \\(k\\), the number of categories
    * \\(\\bar{x}_j\\), the mean of the scores in category j
    * \\(SS_w\\), the sum of squares of within = error (the variability within the groups)
    
    References
    ----------
    Cohen, J. (1988). *Statistical power analysis for the behavioral sciences* (2nd ed.). L. Erlbaum Associates.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    if type(nomField) == list:
        nomField = pd.Series(nomField)
        
    if type(scaleField) == list:
        scaleField = pd.Series(scaleField)
        
    data = pd.concat([nomField, scaleField], axis=1)
    data.columns = ["category", "score"]
    
    #remove unused categories
    if categories is not None:
        data = data[data.category.isin(categories)]
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n, mean and ss
    n = len(data["category"])
    m = data.score.mean()
    sst = data.score.var()*(n-1)
    
    #sample sizes, and means per category
    nj = data.groupby('category').count()
    sj = data.groupby('category').sum()
    mj = data.groupby('category').mean()
    
    #number of categories
    k = len(mj)
    
    ssb = (nj*(mj-m)**2)['score'].sum()
    ssw = sst - ssb
    
    s = (ssw/n)**0.5
    d = ((mj.max() - mj.min())/s).iloc[0]
    
    return d

Functions

def es_cohen_d(nomField, scaleField, categories=None)

Cohen D

An effect size measure for a one-way ANOVA. It simply compares the largest possible difference between two categories means and divides this over the total variance.

Note that most often Cohen d is reported with pairwise tests, but that is actually Cohen dz. That version is available using es_cohen_d_ps().

Parameters

nomField : pandas series
data with categories
scaleField : pandas series
data with the scores
categories : list or dictionary, optional
the categories to use from catField

Returns

d : float
the Cohen d value

Notes

The formula used (Cohen, 1988, p. 276): d = \frac{\bar{x}_{max} - \bar{x}_{min}}{\sigma}

With: \sigma = \sqrt{\frac{SS_w}{n}} SS_w = \sum_{j=1}^k \sum_{i=1}^{n_j} \left(x_{i,j} - \bar{x}_j\right)^2 \bar{x}_{max} = \max\left(\bar{x}_1, \bar{x}_2, \dots, \bar{x}_k\right) \bar{x}_{min} = \min\left(\bar{x}_1, \bar{x}_2, \dots, \bar{x}_k\right) \bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j}

Symbols used:

  • x_{i,j}, the i-th score in category j
  • n, the total sample size
  • n_j, the number of scores in category j
  • k, the number of categories
  • \bar{x}_j, the mean of the scores in category j
  • SS_w, the sum of squares of within = error (the variability within the groups)

References

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). L. Erlbaum Associates.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def es_cohen_d(nomField, scaleField, categories=None):
    '''
    Cohen d
    -------
    An effect size measure for a one-way ANOVA. It simply compares the largest possible difference between two categories means and divides this over the total variance.
    
    Note that most often Cohen d is reported with pairwise tests, but that is actually Cohen d<sub>z</sub>. That version is available using es_cohen_d_ps().
    
    Parameters
    ----------
    nomField : pandas series
        data with categories
    scaleField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
        
    Returns
    -------
    d : float
        the Cohen d value
    
    Notes
    -----
    The formula used (Cohen, 1988, p. 276):
    $$d = \\frac{\\bar{x}_{max} - \\bar{x}_{min}}{\\sigma}$$
    
    With:
    $$\\sigma = \\sqrt{\\frac{SS_w}{n}}$$
    $$SS_w = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}_j\\right)^2$$
    $$\\bar{x}_{max} = \\max\\left(\\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k\\right)$$
    $$\\bar{x}_{min} = \\min\\left(\\bar{x}_1, \\bar{x}_2, \\dots, \\bar{x}_k\\right)$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\), the i-th score in category j
    * \\(n\\), the total sample size
    * \\(n_j\\), the number of scores in category j
    * \\(k\\), the number of categories
    * \\(\\bar{x}_j\\), the mean of the scores in category j
    * \\(SS_w\\), the sum of squares of within = error (the variability within the groups)
    
    References
    ----------
    Cohen, J. (1988). *Statistical power analysis for the behavioral sciences* (2nd ed.). L. Erlbaum Associates.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    if type(nomField) == list:
        nomField = pd.Series(nomField)
        
    if type(scaleField) == list:
        scaleField = pd.Series(scaleField)
        
    data = pd.concat([nomField, scaleField], axis=1)
    data.columns = ["category", "score"]
    
    #remove unused categories
    if categories is not None:
        data = data[data.category.isin(categories)]
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n, mean and ss
    n = len(data["category"])
    m = data.score.mean()
    sst = data.score.var()*(n-1)
    
    #sample sizes, and means per category
    nj = data.groupby('category').count()
    sj = data.groupby('category').sum()
    mj = data.groupby('category').mean()
    
    #number of categories
    k = len(mj)
    
    ssb = (nj*(mj-m)**2)['score'].sum()
    ssw = sst - ssb
    
    s = (ssw/n)**0.5
    d = ((mj.max() - mj.min())/s).iloc[0]
    
    return d