Module `stikpetP.effect_sizes.eff_size_glass_delta`

Expand source code

from statistics import mean, variance
import pandas as pd

def es_glass_delta(catField, scaleField, categories=None, dmu=0, control=None):
    '''
    Glass Delta
    -----------
    An effect size measure when comparing two means, with a specified control group.

    The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/GlassDelta.html)
    
    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scaleField : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    control : string or float, optional 
        to indicate which category to use as control group. Default is first category found.
    
    Returns
    -------
    Glass Delata value
    
    Notes
    -----
    The formula used is (Glass, 1976, p. 7):
    $$\\delta = \\frac{\\bar{x}_1 - \\bar{x}_2}{s_2}$$
    
    With:
    $$s_2 = \\sqrt{\\frac{\\sum_{i=1}^{n_2} \\left(x_{2,i} - \\bar{x}_2\\right)^2}{n_2 - 1}}$$
    $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\) the j-th score in category i
    * \\(n_i\\) the number of scores in category i
    
    Glass actually uses a ‘control group’ and \\eqn{s_2} is then the standard deviation of the control group. 

    Before, After and Alternatives
    ------------------------------
    Before the effect size you might want to run a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.

    Unfortunately, I've been unable to find any rule-of-thumb specifically for Glass Delta, but most likely the ones from Cohen d should be a decent alternative. These are available with [th_cohen_d()](../other/thumb_cohen_d.html)
    
    Alternative effect sizes include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html)
    
    or the correlation coefficients: [biserial](../correlations/cor_biserial.html), [point-biserial](../effect_sizes/cor_point_biserial.html)
    
    References 
    ----------
    Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. *Educational Researcher, 5*(10), 3–8. https://doi.org/10.3102/0013189X005010003
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Dataframe
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['age']
    >>> ex1 = ex1.replace("89 OR OLDER", "90")
    >>> es_glass_delta(df1['sex'], ex1, control="FEMALE")
    0.04509629567422838
    
    Example 2: List
    >>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
    >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
    >>> es_glass_delta(groups, scores)
    0.83604435914283
    
    '''
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scaleField) is list:
        scaleField = pd.Series(scaleField)
    
    #combine as one dataframe
    df = pd.concat([catField, scaleField], axis=1)
    df = df.dropna()
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    var1 = variance(x1)
    var2 = variance(x2)
    m1 = mean(x1)
    m2 = mean(x2)
    
    sd1 = (var1)**0.5
    sd2 = (var2)**0.5
    
    if control is None or control==cat2:
        s = sd2
    elif (control==cat1):
        s= sd1
        
    gd = (m1- m2)/s
    
    return(gd)

Functions

def es_glass_delta(catField, scaleField, categories=None, dmu=0, control=None)

Glass Delta

An effect size measure when comparing two means, with a specified control group.

The measure is also described at PeterStatistics.com

Parameters

catField : dataframe or list: the categorical data
scaleField : dataframe or list: the scores
categories : list, optional: to indicate which two categories of catField to use, otherwise first two found will be used.
dmu : float, optional: difference according to null hypothesis (default is 0)
control : string or float, optional: to indicate which category to use as control group. Default is first category found.

Returns

Glass Delata value

Notes

The formula used is (Glass, 1976, p. 7): $\delta = \frac{\bar{x}_1 - \bar{x}_2}{s_2}$

With: $s_2 = \sqrt{\frac{\sum_{i=1}^{n_2} \left(x_{2,i} - \bar{x}_2\right)^2}{n_2 - 1}}$ $\bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}$

Symbols used:

$x_{i,j}$ the j-th score in category i
$n_i$ the number of scores in category i

Glass actually uses a ‘control group’ and \eqn{s_2} is then the standard deviation of the control group.

Before, After and Alternatives

Before the effect size you might want to run a test. Various options include ts_student_t_os for One-Sample Student t-Test, ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or ts_z_os for One-Sample Z Test.

Unfortunately, I've been unable to find any rule-of-thumb specifically for Glass Delta, but most likely the ones from Cohen d should be a decent alternative. These are available with th_cohen_d()

Alternative effect sizes include: Common Language, Cohen d_s, Cohen U, Hedges g, Glass delta

or the correlation coefficients: biserial, point-biserial

References

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. https://doi.org/10.3102/0013189X005010003

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Dataframe

>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['age']
>>> ex1 = ex1.replace("89 OR OLDER", "90")
>>> es_glass_delta(df1['sex'], ex1, control="FEMALE")
0.04509629567422838

Example 2: List

>>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
>>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
>>> es_glass_delta(groups, scores)
0.83604435914283

Expand source code

def es_glass_delta(catField, scaleField, categories=None, dmu=0, control=None):
    '''
    Glass Delta
    -----------
    An effect size measure when comparing two means, with a specified control group.

    The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/GlassDelta.html)
    
    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scaleField : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    control : string or float, optional 
        to indicate which category to use as control group. Default is first category found.
    
    Returns
    -------
    Glass Delata value
    
    Notes
    -----
    The formula used is (Glass, 1976, p. 7):
    $$\\delta = \\frac{\\bar{x}_1 - \\bar{x}_2}{s_2}$$
    
    With:
    $$s_2 = \\sqrt{\\frac{\\sum_{i=1}^{n_2} \\left(x_{2,i} - \\bar{x}_2\\right)^2}{n_2 - 1}}$$
    $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\) the j-th score in category i
    * \\(n_i\\) the number of scores in category i
    
    Glass actually uses a ‘control group’ and \\eqn{s_2} is then the standard deviation of the control group. 

    Before, After and Alternatives
    ------------------------------
    Before the effect size you might want to run a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.

    Unfortunately, I've been unable to find any rule-of-thumb specifically for Glass Delta, but most likely the ones from Cohen d should be a decent alternative. These are available with [th_cohen_d()](../other/thumb_cohen_d.html)
    
    Alternative effect sizes include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html)
    
    or the correlation coefficients: [biserial](../correlations/cor_biserial.html), [point-biserial](../effect_sizes/cor_point_biserial.html)
    
    References 
    ----------
    Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. *Educational Researcher, 5*(10), 3–8. https://doi.org/10.3102/0013189X005010003
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Dataframe
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['age']
    >>> ex1 = ex1.replace("89 OR OLDER", "90")
    >>> es_glass_delta(df1['sex'], ex1, control="FEMALE")
    0.04509629567422838
    
    Example 2: List
    >>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
    >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
    >>> es_glass_delta(groups, scores)
    0.83604435914283
    
    '''
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scaleField) is list:
        scaleField = pd.Series(scaleField)
    
    #combine as one dataframe
    df = pd.concat([catField, scaleField], axis=1)
    df = df.dropna()
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    var1 = variance(x1)
    var2 = variance(x2)
    m1 = mean(x1)
    m2 = mean(x2)
    
    sd1 = (var1)**0.5
    sd2 = (var2)**0.5
    
    if control is None or control==cat2:
        s = sd2
    elif (control==cat1):
        s= sd1
        
    gd = (m1- m2)/s
    
    return(gd)