Module stikpetP.effect_sizes.eff_size_common_language_is

Expand source code
from statistics import mean, variance, NormalDist
import pandas as pd
from scipy.stats import rankdata

def es_common_language_is(catField, scores, categories=None, levels=None, dmu=0, method="brute"):
    '''
    Common Language Effect Size (Independent Samples)
    -----------------------------------------------
    The Common Language Effect Size (a.k.a. Probability of Superiority) is the probability of taking a random pair from two categories, the first is greater than the first, i.e. 
    $$P(X > Y)$$
    
    Note however that Wolfe and Hogg (1971) actually had this in reverse, i.e.
    
    $$P(X \\leq Y)$$
    
    Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which makes the definition:
    
    $$P(X > Y) + \\frac{P(X = Y)}{2}$$

    It was further developed by Vargha and Delaney (2000) especially in light of a Mann-Whitney U test.
        
    For scale data, an approximation using the standard normal distribution is also available.
    
    The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000)

    The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/CommonLanguageEffectSize.html)

    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scores : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    levels : list or dictionary, optional
        the scores in order
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    method : {"brute", "appr", "vda", "brute-it"} : optional
        method to use. "brute" will use a brute force " that will split ties evenly, "brute-it" is the same as brute but ignores ties, "vda" will use the calculation from Vargha-Delany, and "appr" a normal approximation from McGraw-Wong
        
    Returns
    -------
    A dataframe with:
    
    * *CLE cat. 1*, the effect size for the first category
    * *CLE cat. 2*, the effect size for the second category
    
    Notes
    ------
    For "brute" simply all possible pairs are determined and half of the ties are added, i.e. (Grissom, 1994, p. 282):
    $$P(X > Y) + \\frac{P(X = Y)}{2}$$

    With "brute-it" the ties are ignored (it = ignore ties):
    $$P(X > Y)$$

    The "appr" uses the approximation from McGraw and Wong (1992, p. 361):
    $$CL = \\Phi\\left(z\\right)$$
    
    With:
    $$z = \\frac{\\left|\\bar{x}_1 - \\bar{x}_2\\right|}{\\sqrt{s_1^2 + s_2^2}}$$
    $$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$
    $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\) the j-th score in category i
    * \\(n_i\\) the number of scores in category i
    * \\(\\Phi\\left(\\dots\\right)\\) the cumulative density function of the standard normal distribution

    The "vda" uses the formula used from Vargha and Delaney (2000, p. 107):
    $$A = \\frac{1}{n_j}\\times\\left(\\frac{R_i}{n_i} - \\frac{n_i + 1}{2}\\right)$$
    
    *with*
    * \\(R_i\\) the sum of the ranks in category i
    
    It could also be calculated from the Mann-Whitney U value:
    $$A = \\frac{U}{n_1\\times n_2}$$
    
    Note that the difference between the two options (using category 1 or category 2) will be the deviation from 0.5. If all scores in the first category are lower than the scores in the second, A will be 0 using the first category, and 1 for the second.
    
    If the number of scores in the first category higher than the second, is the same as the other way around, A (no matter which category used) will be 0.5.
    
    The CLE can be converted to a Rank Biserial (= Cliff delta) using the **es_convert()** function. This can then be converted to a Cohen d, and then the rules-of-thumb for Cohen d could be used (**th_cohen_d()**)
    
    The CLE for the other category is simply 1 - CLE, except for the case where ties are ignored ("brute-it").

    Before, After and Alternatives
    ------------------------------
    Before the effect size you might want to run a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.

    To get some rule-of-thumb convert a CLE to a rank-biserial using the [es_convert()](../effect_sizes/convert_es.html) function, and set `fr="cle", to="rb"`, or convert the result to Cohen d, use `fr="rb", to="cohend"`.

    Then use the rules of thumb for the rank-biserial using [th_rank_biserial()](../other/thumb_rank_biserial.html), or for Cohen d use [th_cohen_d()](../other/thumb_cohen_d.html)
    
    Alternative effect sizes include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html)
    
    or the correlation coefficients: [biserial](../correlations/cor_biserial.html), [point-biserial](../effect_sizes/cor_point_biserial.html)
    
    References
    ----------
    Grissom, R. J. (1994). Statistical analysis of ordinal categorical status after therapies. *Journal of Consulting and Clinical Psychology, 62*(2), 281–284. doi:10.1037/0022-006X.62.2.281
    
    McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. *Psychological Bulletin, 111*(2), 361–365. doi:10.1037/0033-2909.111.2.361

    Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. *Journal of Educational and Behavioral Statistics, 25*(2), 101–132. doi:10.3102/10769986025002101
    
    Wolfe, D. A., & Hogg, R. V. (1971). On constructing statistics and reporting data. *The American Statistician, 25*(4), 27–30. doi:10.1080/00031305.1971.10477278
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scores) is list:
        scores = pd.Series(scores)
    
    #combine as one dataframe
    df = pd.concat([catField, scores], axis=1)
    df = df.dropna()

    #replace the ordinal values if levels is provided
    if levels is not None:
        df.iloc[:,1] = df.iloc[:,1].replace(levels)
        df.iloc[:,1]  = pd.to_numeric(df.iloc[:,1] )
    else:
        df.iloc[:,1]  = pd.to_numeric(df.iloc[:,1] )
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    var1 = variance(x1)
    var2 = variance(x2)
    m1 = mean(x1)
    m2 = mean(x2)
    
    if method=="appr":
        z = (m1 - m2 - dmu)/(var1 + var2)**0.5

        c1 = NormalDist().cdf(z)
        c2 = 1 - c1

    elif method=="vda":
        #combine this into one long list
        allScores = x1 + x2
        
        #get the ranks
        allRanks = rankdata(allScores)
        
        #get the ranks per category
        cat1Ranks = allRanks[0:n1]
        cat2Ranks = allRanks[n1:n]
        
        r1 = sum(cat1Ranks)
        r2 = sum(cat2Ranks)
        
        c1 = 1 / n2 * (r1 / n1 - (n1 + 1) / 2)
        c2 = 1 / n1 * (r2 / n2 - (n2 + 1) / 2)
    
    elif method=="brute" or method=="brute-it":
        difs = [i - j for i in x1 for j in x2]
        # total number of pairs
        n = len(difs)
        
        xGTy = sum([i > 0 for i in difs])
        
        if method=="brute":
            #Counting ties half to each
            
            # number of pairs with score in first category being equal to the second
            xEQy = sum([i == 0 for i in difs])
            
            # probability
            c1 = xGTy/n + 1/2*(xEQy/n)
            c2 = 1 - c1
        else:
            c1 = xGTy/n
            c2 = sum([i < 0 for i in difs])/n
            
    #the results
    colnames = ["CLE " + cat1, "CLE " + cat2]
    results = pd.DataFrame([[c1, c2]], columns=colnames)
    
    return(results)

Functions

def es_common_language_is(catField, scores, categories=None, levels=None, dmu=0, method='brute')

Common Language Effect Size (Independent Samples)

The Common Language Effect Size (a.k.a. Probability of Superiority) is the probability of taking a random pair from two categories, the first is greater than the first, i.e. P(X > Y)

Note however that Wolfe and Hogg (1971) actually had this in reverse, i.e.

P(X \leq Y)

Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which makes the definition:

P(X > Y) + \frac{P(X = Y)}{2}

It was further developed by Vargha and Delaney (2000) especially in light of a Mann-Whitney U test.

For scale data, an approximation using the standard normal distribution is also available.

The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000)

The measure is also described at PeterStatistics.com

Parameters

catField : dataframe or list
the categorical data
scores : dataframe or list
the scores
categories : list, optional
to indicate which two categories of catField to use, otherwise first two found will be used.
levels : list or dictionary, optional
the scores in order
dmu : float, optional
difference according to null hypothesis (default is 0)
method : {"brute", "appr", "vda", "brute-it"} : optional
method to use. "brute" will use a brute force " that will split ties evenly, "brute-it" is the same as brute but ignores ties, "vda" will use the calculation from Vargha-Delany, and "appr" a normal approximation from McGraw-Wong

Returns

A dataframe with:
 
  • CLE cat. 1, the effect size for the first category
  • CLE cat. 2, the effect size for the second category

Notes

For "brute" simply all possible pairs are determined and half of the ties are added, i.e. (Grissom, 1994, p. 282): P(X > Y) + \frac{P(X = Y)}{2}

With "brute-it" the ties are ignored (it = ignore ties): P(X > Y)

The "appr" uses the approximation from McGraw and Wong (1992, p. 361): CL = \Phi\left(z\right)

With: z = \frac{\left|\bar{x}_1 - \bar{x}_2\right|}{\sqrt{s_1^2 + s_2^2}} s_i^2 = \frac{\sum_{j=1}^{n_i} \left(x_{i,j} - \bar{x}_i\right)^2}{n_i - 1} \bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}

Symbols used:

  • x_{i,j} the j-th score in category i
  • n_i the number of scores in category i
  • \Phi\left(\dots\right) the cumulative density function of the standard normal distribution

The "vda" uses the formula used from Vargha and Delaney (2000, p. 107): A = \frac{1}{n_j}\times\left(\frac{R_i}{n_i} - \frac{n_i + 1}{2}\right)

with * R_i the sum of the ranks in category i

It could also be calculated from the Mann-Whitney U value: A = \frac{U}{n_1\times n_2}

Note that the difference between the two options (using category 1 or category 2) will be the deviation from 0.5. If all scores in the first category are lower than the scores in the second, A will be 0 using the first category, and 1 for the second.

If the number of scores in the first category higher than the second, is the same as the other way around, A (no matter which category used) will be 0.5.

The CLE can be converted to a Rank Biserial (= Cliff delta) using the es_convert() function. This can then be converted to a Cohen d, and then the rules-of-thumb for Cohen d could be used (th_cohen_d())

The CLE for the other category is simply 1 - CLE, except for the case where ties are ignored ("brute-it").

Before, After and Alternatives

Before the effect size you might want to run a test. Various options include ts_student_t_os for One-Sample Student t-Test, ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or ts_z_os for One-Sample Z Test.

To get some rule-of-thumb convert a CLE to a rank-biserial using the es_convert() function, and set fr="cle", to="rb", or convert the result to Cohen d, use fr="rb", to="cohend".

Then use the rules of thumb for the rank-biserial using th_rank_biserial(), or for Cohen d use th_cohen_d()

Alternative effect sizes include: Common Language, Cohen d_s, Cohen U, Hedges g, Glass delta

or the correlation coefficients: biserial, point-biserial

References

Grissom, R. J. (1994). Statistical analysis of ordinal categorical status after therapies. Journal of Consulting and Clinical Psychology, 62(2), 281–284. doi:10.1037/0022-006X.62.2.281

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361–365. doi:10.1037/0033-2909.111.2.361

Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101–132. doi:10.3102/10769986025002101

Wolfe, D. A., & Hogg, R. V. (1971). On constructing statistics and reporting data. The American Statistician, 25(4), 27–30. doi:10.1080/00031305.1971.10477278

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def es_common_language_is(catField, scores, categories=None, levels=None, dmu=0, method="brute"):
    '''
    Common Language Effect Size (Independent Samples)
    -----------------------------------------------
    The Common Language Effect Size (a.k.a. Probability of Superiority) is the probability of taking a random pair from two categories, the first is greater than the first, i.e. 
    $$P(X > Y)$$
    
    Note however that Wolfe and Hogg (1971) actually had this in reverse, i.e.
    
    $$P(X \\leq Y)$$
    
    Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which makes the definition:
    
    $$P(X > Y) + \\frac{P(X = Y)}{2}$$

    It was further developed by Vargha and Delaney (2000) especially in light of a Mann-Whitney U test.
        
    For scale data, an approximation using the standard normal distribution is also available.
    
    The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000)

    The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/CommonLanguageEffectSize.html)

    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scores : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    levels : list or dictionary, optional
        the scores in order
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    method : {"brute", "appr", "vda", "brute-it"} : optional
        method to use. "brute" will use a brute force " that will split ties evenly, "brute-it" is the same as brute but ignores ties, "vda" will use the calculation from Vargha-Delany, and "appr" a normal approximation from McGraw-Wong
        
    Returns
    -------
    A dataframe with:
    
    * *CLE cat. 1*, the effect size for the first category
    * *CLE cat. 2*, the effect size for the second category
    
    Notes
    ------
    For "brute" simply all possible pairs are determined and half of the ties are added, i.e. (Grissom, 1994, p. 282):
    $$P(X > Y) + \\frac{P(X = Y)}{2}$$

    With "brute-it" the ties are ignored (it = ignore ties):
    $$P(X > Y)$$

    The "appr" uses the approximation from McGraw and Wong (1992, p. 361):
    $$CL = \\Phi\\left(z\\right)$$
    
    With:
    $$z = \\frac{\\left|\\bar{x}_1 - \\bar{x}_2\\right|}{\\sqrt{s_1^2 + s_2^2}}$$
    $$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$
    $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\) the j-th score in category i
    * \\(n_i\\) the number of scores in category i
    * \\(\\Phi\\left(\\dots\\right)\\) the cumulative density function of the standard normal distribution

    The "vda" uses the formula used from Vargha and Delaney (2000, p. 107):
    $$A = \\frac{1}{n_j}\\times\\left(\\frac{R_i}{n_i} - \\frac{n_i + 1}{2}\\right)$$
    
    *with*
    * \\(R_i\\) the sum of the ranks in category i
    
    It could also be calculated from the Mann-Whitney U value:
    $$A = \\frac{U}{n_1\\times n_2}$$
    
    Note that the difference between the two options (using category 1 or category 2) will be the deviation from 0.5. If all scores in the first category are lower than the scores in the second, A will be 0 using the first category, and 1 for the second.
    
    If the number of scores in the first category higher than the second, is the same as the other way around, A (no matter which category used) will be 0.5.
    
    The CLE can be converted to a Rank Biserial (= Cliff delta) using the **es_convert()** function. This can then be converted to a Cohen d, and then the rules-of-thumb for Cohen d could be used (**th_cohen_d()**)
    
    The CLE for the other category is simply 1 - CLE, except for the case where ties are ignored ("brute-it").

    Before, After and Alternatives
    ------------------------------
    Before the effect size you might want to run a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.

    To get some rule-of-thumb convert a CLE to a rank-biserial using the [es_convert()](../effect_sizes/convert_es.html) function, and set `fr="cle", to="rb"`, or convert the result to Cohen d, use `fr="rb", to="cohend"`.

    Then use the rules of thumb for the rank-biserial using [th_rank_biserial()](../other/thumb_rank_biserial.html), or for Cohen d use [th_cohen_d()](../other/thumb_cohen_d.html)
    
    Alternative effect sizes include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html)
    
    or the correlation coefficients: [biserial](../correlations/cor_biserial.html), [point-biserial](../effect_sizes/cor_point_biserial.html)
    
    References
    ----------
    Grissom, R. J. (1994). Statistical analysis of ordinal categorical status after therapies. *Journal of Consulting and Clinical Psychology, 62*(2), 281–284. doi:10.1037/0022-006X.62.2.281
    
    McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. *Psychological Bulletin, 111*(2), 361–365. doi:10.1037/0033-2909.111.2.361

    Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. *Journal of Educational and Behavioral Statistics, 25*(2), 101–132. doi:10.3102/10769986025002101
    
    Wolfe, D. A., & Hogg, R. V. (1971). On constructing statistics and reporting data. *The American Statistician, 25*(4), 27–30. doi:10.1080/00031305.1971.10477278
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scores) is list:
        scores = pd.Series(scores)
    
    #combine as one dataframe
    df = pd.concat([catField, scores], axis=1)
    df = df.dropna()

    #replace the ordinal values if levels is provided
    if levels is not None:
        df.iloc[:,1] = df.iloc[:,1].replace(levels)
        df.iloc[:,1]  = pd.to_numeric(df.iloc[:,1] )
    else:
        df.iloc[:,1]  = pd.to_numeric(df.iloc[:,1] )
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    var1 = variance(x1)
    var2 = variance(x2)
    m1 = mean(x1)
    m2 = mean(x2)
    
    if method=="appr":
        z = (m1 - m2 - dmu)/(var1 + var2)**0.5

        c1 = NormalDist().cdf(z)
        c2 = 1 - c1

    elif method=="vda":
        #combine this into one long list
        allScores = x1 + x2
        
        #get the ranks
        allRanks = rankdata(allScores)
        
        #get the ranks per category
        cat1Ranks = allRanks[0:n1]
        cat2Ranks = allRanks[n1:n]
        
        r1 = sum(cat1Ranks)
        r2 = sum(cat2Ranks)
        
        c1 = 1 / n2 * (r1 / n1 - (n1 + 1) / 2)
        c2 = 1 / n1 * (r2 / n2 - (n2 + 1) / 2)
    
    elif method=="brute" or method=="brute-it":
        difs = [i - j for i in x1 for j in x2]
        # total number of pairs
        n = len(difs)
        
        xGTy = sum([i > 0 for i in difs])
        
        if method=="brute":
            #Counting ties half to each
            
            # number of pairs with score in first category being equal to the second
            xEQy = sum([i == 0 for i in difs])
            
            # probability
            c1 = xGTy/n + 1/2*(xEQy/n)
            c2 = 1 - c1
        else:
            c1 = xGTy/n
            c2 = sum([i < 0 for i in difs])/n
            
    #the results
    colnames = ["CLE " + cat1, "CLE " + cat2]
    results = pd.DataFrame([[c1, c2]], columns=colnames)
    
    return(results)