Module stikpetP.effect_sizes.eff_size_common_language_is
Expand source code
from statistics import mean, variance, NormalDist
import pandas as pd
from scipy.stats import rankdata
def es_common_language_is(catField, scores, categories=None, levels=None, dmu=0, method="brute"):
'''
Common Language Effect Size (Independent Samples)
-----------------------------------------------
The Common Language Effect Size (a.k.a. Probability of Superiority) is the probability of taking a random pair from two categories, the first is greater than the first, i.e.
$$P(X > Y)$$
Note however that Wolfe and Hogg (1971) actually had this in reverse, i.e.
$$P(X \\leq Y)$$
Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which makes the definition:
$$P(X > Y) + \\frac{P(X = Y)}{2}$$
It was further developed by Vargha and Delaney (2000) especially in light of a Mann-Whitney U test.
For scale data, an approximation using the standard normal distribution is also available.
The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000)
The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/CommonLanguageEffectSize.html)
Parameters
----------
catField : dataframe or list
the categorical data
scores : dataframe or list
the scores
categories : list, optional
to indicate which two categories of catField to use, otherwise first two found will be used.
levels : list or dictionary, optional
the scores in order
dmu : float, optional
difference according to null hypothesis (default is 0)
method : {"brute", "appr", "vda", "brute-it"} : optional
method to use. "brute" will use a brute force " that will split ties evenly, "brute-it" is the same as brute but ignores ties, "vda" will use the calculation from Vargha-Delany, and "appr" a normal approximation from McGraw-Wong
Returns
-------
A dataframe with:
* *CLE cat. 1*, the effect size for the first category
* *CLE cat. 2*, the effect size for the second category
Notes
------
For "brute" simply all possible pairs are determined and half of the ties are added, i.e. (Grissom, 1994, p. 282):
$$P(X > Y) + \\frac{P(X = Y)}{2}$$
With "brute-it" the ties are ignored (it = ignore ties):
$$P(X > Y)$$
The "appr" uses the approximation from McGraw and Wong (1992, p. 361):
$$CL = \\Phi\\left(z\\right)$$
With:
$$z = \\frac{\\left|\\bar{x}_1 - \\bar{x}_2\\right|}{\\sqrt{s_1^2 + s_2^2}}$$
$$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$
$$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
*Symbols used:*
* \\(x_{i,j}\\) the j-th score in category i
* \\(n_i\\) the number of scores in category i
* \\(\\Phi\\left(\\dots\\right)\\) the cumulative density function of the standard normal distribution
The "vda" uses the formula used from Vargha and Delaney (2000, p. 107):
$$A = \\frac{1}{n_j}\\times\\left(\\frac{R_i}{n_i} - \\frac{n_i + 1}{2}\\right)$$
*with*
* \\(R_i\\) the sum of the ranks in category i
It could also be calculated from the Mann-Whitney U value:
$$A = \\frac{U}{n_1\\times n_2}$$
Note that the difference between the two options (using category 1 or category 2) will be the deviation from 0.5. If all scores in the first category are lower than the scores in the second, A will be 0 using the first category, and 1 for the second.
If the number of scores in the first category higher than the second, is the same as the other way around, A (no matter which category used) will be 0.5.
The CLE can be converted to a Rank Biserial (= Cliff delta) using the **es_convert()** function. This can then be converted to a Cohen d, and then the rules-of-thumb for Cohen d could be used (**th_cohen_d()**)
The CLE for the other category is simply 1 - CLE, except for the case where ties are ignored ("brute-it").
Before, After and Alternatives
------------------------------
Before the effect size you might want to run a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.
To get some rule-of-thumb convert a CLE to a rank-biserial using the [es_convert()](../effect_sizes/convert_es.html) function, and set `fr="cle", to="rb"`, or convert the result to Cohen d, use `fr="rb", to="cohend"`.
Then use the rules of thumb for the rank-biserial using [th_rank_biserial()](../other/thumb_rank_biserial.html), or for Cohen d use [th_cohen_d()](../other/thumb_cohen_d.html)
Alternative effect sizes include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html)
or the correlation coefficients: [biserial](../correlations/cor_biserial.html), [point-biserial](../effect_sizes/cor_point_biserial.html)
References
----------
Grissom, R. J. (1994). Statistical analysis of ordinal categorical status after therapies. *Journal of Consulting and Clinical Psychology, 62*(2), 281–284. doi:10.1037/0022-006X.62.2.281
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. *Psychological Bulletin, 111*(2), 361–365. doi:10.1037/0033-2909.111.2.361
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. *Journal of Educational and Behavioral Statistics, 25*(2), 101–132. doi:10.3102/10769986025002101
Wolfe, D. A., & Hogg, R. V. (1971). On constructing statistics and reporting data. *The American Statistician, 25*(4), 27–30. doi:10.1080/00031305.1971.10477278
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
#convert to pandas series if needed
if type(catField) is list:
catField = pd.Series(catField)
if type(scores) is list:
scores = pd.Series(scores)
#combine as one dataframe
df = pd.concat([catField, scores], axis=1)
df = df.dropna()
#replace the ordinal values if levels is provided
if levels is not None:
df.iloc[:,1] = df.iloc[:,1].replace(levels)
df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] )
else:
df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] )
#the two categories
if categories is not None:
cat1 = categories[0]
cat2 = categories[1]
else:
cat1 = df.iloc[:,0].value_counts().index[0]
cat2 = df.iloc[:,0].value_counts().index[1]
#seperate the scores for each category
x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
#make sure they are floats
x1 = [float(x) for x in x1]
x2 = [float(x) for x in x2]
n1 = len(x1)
n2 = len(x2)
n = n1 + n2
var1 = variance(x1)
var2 = variance(x2)
m1 = mean(x1)
m2 = mean(x2)
if method=="appr":
z = (m1 - m2 - dmu)/(var1 + var2)**0.5
c1 = NormalDist().cdf(z)
c2 = 1 - c1
elif method=="vda":
#combine this into one long list
allScores = x1 + x2
#get the ranks
allRanks = rankdata(allScores)
#get the ranks per category
cat1Ranks = allRanks[0:n1]
cat2Ranks = allRanks[n1:n]
r1 = sum(cat1Ranks)
r2 = sum(cat2Ranks)
c1 = 1 / n2 * (r1 / n1 - (n1 + 1) / 2)
c2 = 1 / n1 * (r2 / n2 - (n2 + 1) / 2)
elif method=="brute" or method=="brute-it":
difs = [i - j for i in x1 for j in x2]
# total number of pairs
n = len(difs)
xGTy = sum([i > 0 for i in difs])
if method=="brute":
#Counting ties half to each
# number of pairs with score in first category being equal to the second
xEQy = sum([i == 0 for i in difs])
# probability
c1 = xGTy/n + 1/2*(xEQy/n)
c2 = 1 - c1
else:
c1 = xGTy/n
c2 = sum([i < 0 for i in difs])/n
#the results
colnames = ["CLE " + cat1, "CLE " + cat2]
results = pd.DataFrame([[c1, c2]], columns=colnames)
return(results)
Functions
def es_common_language_is(catField, scores, categories=None, levels=None, dmu=0, method='brute')
-
Common Language Effect Size (Independent Samples)
The Common Language Effect Size (a.k.a. Probability of Superiority) is the probability of taking a random pair from two categories, the first is greater than the first, i.e. P(X > Y)
Note however that Wolfe and Hogg (1971) actually had this in reverse, i.e.
P(X \leq Y)
Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which makes the definition:
P(X > Y) + \frac{P(X = Y)}{2}
It was further developed by Vargha and Delaney (2000) especially in light of a Mann-Whitney U test.
For scale data, an approximation using the standard normal distribution is also available.
The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000)
The measure is also described at PeterStatistics.com
Parameters
catField
:dataframe
orlist
- the categorical data
scores
:dataframe
orlist
- the scores
categories
:list
, optional- to indicate which two categories of catField to use, otherwise first two found will be used.
levels
:list
ordictionary
, optional- the scores in order
dmu
:float
, optional- difference according to null hypothesis (default is 0)
method
:{"brute", "appr", "vda", "brute-it"} : optional
- method to use. "brute" will use a brute force " that will split ties evenly, "brute-it" is the same as brute but ignores ties, "vda" will use the calculation from Vargha-Delany, and "appr" a normal approximation from McGraw-Wong
Returns
A dataframe with:
- CLE cat. 1, the effect size for the first category
- CLE cat. 2, the effect size for the second category
Notes
For "brute" simply all possible pairs are determined and half of the ties are added, i.e. (Grissom, 1994, p. 282): P(X > Y) + \frac{P(X = Y)}{2}
With "brute-it" the ties are ignored (it = ignore ties): P(X > Y)
The "appr" uses the approximation from McGraw and Wong (1992, p. 361): CL = \Phi\left(z\right)
With: z = \frac{\left|\bar{x}_1 - \bar{x}_2\right|}{\sqrt{s_1^2 + s_2^2}} s_i^2 = \frac{\sum_{j=1}^{n_i} \left(x_{i,j} - \bar{x}_i\right)^2}{n_i - 1} \bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}
Symbols used:
- x_{i,j} the j-th score in category i
- n_i the number of scores in category i
- \Phi\left(\dots\right) the cumulative density function of the standard normal distribution
The "vda" uses the formula used from Vargha and Delaney (2000, p. 107): A = \frac{1}{n_j}\times\left(\frac{R_i}{n_i} - \frac{n_i + 1}{2}\right)
with * R_i the sum of the ranks in category i
It could also be calculated from the Mann-Whitney U value: A = \frac{U}{n_1\times n_2}
Note that the difference between the two options (using category 1 or category 2) will be the deviation from 0.5. If all scores in the first category are lower than the scores in the second, A will be 0 using the first category, and 1 for the second.
If the number of scores in the first category higher than the second, is the same as the other way around, A (no matter which category used) will be 0.5.
The CLE can be converted to a Rank Biserial (= Cliff delta) using the es_convert() function. This can then be converted to a Cohen d, and then the rules-of-thumb for Cohen d could be used (th_cohen_d())
The CLE for the other category is simply 1 - CLE, except for the case where ties are ignored ("brute-it").
Before, After and Alternatives
Before the effect size you might want to run a test. Various options include ts_student_t_os for One-Sample Student t-Test, ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or ts_z_os for One-Sample Z Test.
To get some rule-of-thumb convert a CLE to a rank-biserial using the es_convert() function, and set
fr="cle", to="rb"
, or convert the result to Cohen d, usefr="rb", to="cohend"
.Then use the rules of thumb for the rank-biserial using th_rank_biserial(), or for Cohen d use th_cohen_d()
Alternative effect sizes include: Common Language, Cohen d_s, Cohen U, Hedges g, Glass delta
or the correlation coefficients: biserial, point-biserial
References
Grissom, R. J. (1994). Statistical analysis of ordinal categorical status after therapies. Journal of Consulting and Clinical Psychology, 62(2), 281–284. doi:10.1037/0022-006X.62.2.281
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361–365. doi:10.1037/0033-2909.111.2.361
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101–132. doi:10.3102/10769986025002101
Wolfe, D. A., & Hogg, R. V. (1971). On constructing statistics and reporting data. The American Statistician, 25(4), 27–30. doi:10.1080/00031305.1971.10477278
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def es_common_language_is(catField, scores, categories=None, levels=None, dmu=0, method="brute"): ''' Common Language Effect Size (Independent Samples) ----------------------------------------------- The Common Language Effect Size (a.k.a. Probability of Superiority) is the probability of taking a random pair from two categories, the first is greater than the first, i.e. $$P(X > Y)$$ Note however that Wolfe and Hogg (1971) actually had this in reverse, i.e. $$P(X \\leq Y)$$ Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which makes the definition: $$P(X > Y) + \\frac{P(X = Y)}{2}$$ It was further developed by Vargha and Delaney (2000) especially in light of a Mann-Whitney U test. For scale data, an approximation using the standard normal distribution is also available. The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000) The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/CommonLanguageEffectSize.html) Parameters ---------- catField : dataframe or list the categorical data scores : dataframe or list the scores categories : list, optional to indicate which two categories of catField to use, otherwise first two found will be used. levels : list or dictionary, optional the scores in order dmu : float, optional difference according to null hypothesis (default is 0) method : {"brute", "appr", "vda", "brute-it"} : optional method to use. "brute" will use a brute force " that will split ties evenly, "brute-it" is the same as brute but ignores ties, "vda" will use the calculation from Vargha-Delany, and "appr" a normal approximation from McGraw-Wong Returns ------- A dataframe with: * *CLE cat. 1*, the effect size for the first category * *CLE cat. 2*, the effect size for the second category Notes ------ For "brute" simply all possible pairs are determined and half of the ties are added, i.e. (Grissom, 1994, p. 282): $$P(X > Y) + \\frac{P(X = Y)}{2}$$ With "brute-it" the ties are ignored (it = ignore ties): $$P(X > Y)$$ The "appr" uses the approximation from McGraw and Wong (1992, p. 361): $$CL = \\Phi\\left(z\\right)$$ With: $$z = \\frac{\\left|\\bar{x}_1 - \\bar{x}_2\\right|}{\\sqrt{s_1^2 + s_2^2}}$$ $$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$ $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$ *Symbols used:* * \\(x_{i,j}\\) the j-th score in category i * \\(n_i\\) the number of scores in category i * \\(\\Phi\\left(\\dots\\right)\\) the cumulative density function of the standard normal distribution The "vda" uses the formula used from Vargha and Delaney (2000, p. 107): $$A = \\frac{1}{n_j}\\times\\left(\\frac{R_i}{n_i} - \\frac{n_i + 1}{2}\\right)$$ *with* * \\(R_i\\) the sum of the ranks in category i It could also be calculated from the Mann-Whitney U value: $$A = \\frac{U}{n_1\\times n_2}$$ Note that the difference between the two options (using category 1 or category 2) will be the deviation from 0.5. If all scores in the first category are lower than the scores in the second, A will be 0 using the first category, and 1 for the second. If the number of scores in the first category higher than the second, is the same as the other way around, A (no matter which category used) will be 0.5. The CLE can be converted to a Rank Biserial (= Cliff delta) using the **es_convert()** function. This can then be converted to a Cohen d, and then the rules-of-thumb for Cohen d could be used (**th_cohen_d()**) The CLE for the other category is simply 1 - CLE, except for the case where ties are ignored ("brute-it"). Before, After and Alternatives ------------------------------ Before the effect size you might want to run a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test. To get some rule-of-thumb convert a CLE to a rank-biserial using the [es_convert()](../effect_sizes/convert_es.html) function, and set `fr="cle", to="rb"`, or convert the result to Cohen d, use `fr="rb", to="cohend"`. Then use the rules of thumb for the rank-biserial using [th_rank_biserial()](../other/thumb_rank_biserial.html), or for Cohen d use [th_cohen_d()](../other/thumb_cohen_d.html) Alternative effect sizes include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html) or the correlation coefficients: [biserial](../correlations/cor_biserial.html), [point-biserial](../effect_sizes/cor_point_biserial.html) References ---------- Grissom, R. J. (1994). Statistical analysis of ordinal categorical status after therapies. *Journal of Consulting and Clinical Psychology, 62*(2), 281–284. doi:10.1037/0022-006X.62.2.281 McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. *Psychological Bulletin, 111*(2), 361–365. doi:10.1037/0033-2909.111.2.361 Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. *Journal of Educational and Behavioral Statistics, 25*(2), 101–132. doi:10.3102/10769986025002101 Wolfe, D. A., & Hogg, R. V. (1971). On constructing statistics and reporting data. *The American Statistician, 25*(4), 27–30. doi:10.1080/00031305.1971.10477278 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' #convert to pandas series if needed if type(catField) is list: catField = pd.Series(catField) if type(scores) is list: scores = pd.Series(scores) #combine as one dataframe df = pd.concat([catField, scores], axis=1) df = df.dropna() #replace the ordinal values if levels is provided if levels is not None: df.iloc[:,1] = df.iloc[:,1].replace(levels) df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] ) else: df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] ) #the two categories if categories is not None: cat1 = categories[0] cat2 = categories[1] else: cat1 = df.iloc[:,0].value_counts().index[0] cat2 = df.iloc[:,0].value_counts().index[1] #seperate the scores for each category x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1]) x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2]) #make sure they are floats x1 = [float(x) for x in x1] x2 = [float(x) for x in x2] n1 = len(x1) n2 = len(x2) n = n1 + n2 var1 = variance(x1) var2 = variance(x2) m1 = mean(x1) m2 = mean(x2) if method=="appr": z = (m1 - m2 - dmu)/(var1 + var2)**0.5 c1 = NormalDist().cdf(z) c2 = 1 - c1 elif method=="vda": #combine this into one long list allScores = x1 + x2 #get the ranks allRanks = rankdata(allScores) #get the ranks per category cat1Ranks = allRanks[0:n1] cat2Ranks = allRanks[n1:n] r1 = sum(cat1Ranks) r2 = sum(cat2Ranks) c1 = 1 / n2 * (r1 / n1 - (n1 + 1) / 2) c2 = 1 / n1 * (r2 / n2 - (n2 + 1) / 2) elif method=="brute" or method=="brute-it": difs = [i - j for i in x1 for j in x2] # total number of pairs n = len(difs) xGTy = sum([i > 0 for i in difs]) if method=="brute": #Counting ties half to each # number of pairs with score in first category being equal to the second xEQy = sum([i == 0 for i in difs]) # probability c1 = xGTy/n + 1/2*(xEQy/n) c2 = 1 - c1 else: c1 = xGTy/n c2 = sum([i < 0 for i in difs])/n #the results colnames = ["CLE " + cat1, "CLE " + cat2] results = pd.DataFrame([[c1, c2]], columns=colnames) return(results)