Module stikpetP.effect_sizes.eff_size_glass_delta

Expand source code
from statistics import mean, variance
import pandas as pd

def es_glass_delta(catField, scaleField, categories=None, dmu=0, control=None):
    '''
    Glass Delta
    -----------
    An effect size measure when comparing two means, with a specified control group.

    The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/GlassDelta.html)
    
    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scaleField : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    control : string or float, optional 
        to indicate which category to use as control group. Default is first category found.
    
    Returns
    -------
    Glass Delata value
    
    Notes
    -----
    The formula used is (Glass, 1976, p. 7):
    $$\\delta = \\frac{\\bar{x}_1 - \\bar{x}_2}{s_2}$$
    
    With:
    $$s_2 = \\sqrt{\\frac{\\sum_{i=1}^{n_2} \\left(x_{2,i} - \\bar{x}_2\\right)^2}{n_2 - 1}}$$
    $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\) the j-th score in category i
    * \\(n_i\\) the number of scores in category i
    
    Glass actually uses a ‘control group’ and \\eqn{s_2} is then the standard deviation of the control group. 

    Before, After and Alternatives
    ------------------------------
    Before the effect size you might want to run a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.

    Unfortunately, I've been unable to find any rule-of-thumb specifically for Glass Delta, but most likely the ones from Cohen d should be a decent alternative. These are available with [th_cohen_d()](../other/thumb_cohen_d.html)
    
    Alternative effect sizes include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html)
    
    or the correlation coefficients: [biserial](../correlations/cor_biserial.html), [point-biserial](../effect_sizes/cor_point_biserial.html)
    
    References 
    ----------
    Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. *Educational Researcher, 5*(10), 3–8. https://doi.org/10.3102/0013189X005010003
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Dataframe
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['age']
    >>> ex1 = ex1.replace("89 OR OLDER", "90")
    >>> es_glass_delta(df1['sex'], ex1, control="FEMALE")
    0.04509629567422838
    
    Example 2: List
    >>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
    >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
    >>> es_glass_delta(groups, scores)
    0.83604435914283
    
    '''
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scaleField) is list:
        scaleField = pd.Series(scaleField)
    
    #combine as one dataframe
    df = pd.concat([catField, scaleField], axis=1)
    df = df.dropna()
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    var1 = variance(x1)
    var2 = variance(x2)
    m1 = mean(x1)
    m2 = mean(x2)
    
    sd1 = (var1)**0.5
    sd2 = (var2)**0.5
    
    if control is None or control==cat2:
        s = sd2
    elif (control==cat1):
        s= sd1
        
    gd = (m1- m2)/s
    
    return(gd)

Functions

def es_glass_delta(catField, scaleField, categories=None, dmu=0, control=None)

Glass Delta

An effect size measure when comparing two means, with a specified control group.

The measure is also described at PeterStatistics.com

Parameters

catField : dataframe or list
the categorical data
scaleField : dataframe or list
the scores
categories : list, optional
to indicate which two categories of catField to use, otherwise first two found will be used.
dmu : float, optional
difference according to null hypothesis (default is 0)
control : string or float, optional
to indicate which category to use as control group. Default is first category found.

Returns

Glass Delata value
 

Notes

The formula used is (Glass, 1976, p. 7): \delta = \frac{\bar{x}_1 - \bar{x}_2}{s_2}

With: s_2 = \sqrt{\frac{\sum_{i=1}^{n_2} \left(x_{2,i} - \bar{x}_2\right)^2}{n_2 - 1}} \bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}

Symbols used:

  • x_{i,j} the j-th score in category i
  • n_i the number of scores in category i

Glass actually uses a ‘control group’ and \eqn{s_2} is then the standard deviation of the control group.

Before, After and Alternatives

Before the effect size you might want to run a test. Various options include ts_student_t_os for One-Sample Student t-Test, ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or ts_z_os for One-Sample Z Test.

Unfortunately, I've been unable to find any rule-of-thumb specifically for Glass Delta, but most likely the ones from Cohen d should be a decent alternative. These are available with th_cohen_d()

Alternative effect sizes include: Common Language, Cohen d_s, Cohen U, Hedges g, Glass delta

or the correlation coefficients: biserial, point-biserial

References

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. https://doi.org/10.3102/0013189X005010003

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Dataframe

>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['age']
>>> ex1 = ex1.replace("89 OR OLDER", "90")
>>> es_glass_delta(df1['sex'], ex1, control="FEMALE")
0.04509629567422838

Example 2: List

>>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
>>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
>>> es_glass_delta(groups, scores)
0.83604435914283
Expand source code
def es_glass_delta(catField, scaleField, categories=None, dmu=0, control=None):
    '''
    Glass Delta
    -----------
    An effect size measure when comparing two means, with a specified control group.

    The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/GlassDelta.html)
    
    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scaleField : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    control : string or float, optional 
        to indicate which category to use as control group. Default is first category found.
    
    Returns
    -------
    Glass Delata value
    
    Notes
    -----
    The formula used is (Glass, 1976, p. 7):
    $$\\delta = \\frac{\\bar{x}_1 - \\bar{x}_2}{s_2}$$
    
    With:
    $$s_2 = \\sqrt{\\frac{\\sum_{i=1}^{n_2} \\left(x_{2,i} - \\bar{x}_2\\right)^2}{n_2 - 1}}$$
    $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\) the j-th score in category i
    * \\(n_i\\) the number of scores in category i
    
    Glass actually uses a ‘control group’ and \\eqn{s_2} is then the standard deviation of the control group. 

    Before, After and Alternatives
    ------------------------------
    Before the effect size you might want to run a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.

    Unfortunately, I've been unable to find any rule-of-thumb specifically for Glass Delta, but most likely the ones from Cohen d should be a decent alternative. These are available with [th_cohen_d()](../other/thumb_cohen_d.html)
    
    Alternative effect sizes include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html)
    
    or the correlation coefficients: [biserial](../correlations/cor_biserial.html), [point-biserial](../effect_sizes/cor_point_biserial.html)
    
    References 
    ----------
    Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. *Educational Researcher, 5*(10), 3–8. https://doi.org/10.3102/0013189X005010003
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Dataframe
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['age']
    >>> ex1 = ex1.replace("89 OR OLDER", "90")
    >>> es_glass_delta(df1['sex'], ex1, control="FEMALE")
    0.04509629567422838
    
    Example 2: List
    >>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
    >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
    >>> es_glass_delta(groups, scores)
    0.83604435914283
    
    '''
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scaleField) is list:
        scaleField = pd.Series(scaleField)
    
    #combine as one dataframe
    df = pd.concat([catField, scaleField], axis=1)
    df = df.dropna()
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    var1 = variance(x1)
    var2 = variance(x2)
    m1 = mean(x1)
    m2 = mean(x2)
    
    sd1 = (var1)**0.5
    sd2 = (var2)**0.5
    
    if control is None or control==cat2:
        s = sd2
    elif (control==cat1):
        s= sd1
        
    gd = (m1- m2)/s
    
    return(gd)