Module stikpetP.effect_sizes.eff_size_common_language_is
Expand source code
from statistics import mean, variance, NormalDist
import pandas as pd
from scipy.stats import rankdata
def es_common_language_is(catField, scores, categories=None, levels=None, dmu=0, method="brute"):
'''
Common Language Effect Size (Independent Samples)
-----------------------------------------------
The Common Language Effect Size (a.k.a. Probability of Superiority) is the probability of taking a random pair from two categories, the first is greater than the first, i.e.
$$P(X > Y)$$
Note however that Wolfe and Hogg (1971) actually had this in reverse, i.e.
$$P(X \\leq Y)$$
Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which makes the definition:
$$P(X > Y) + \\frac{P(X = Y)}{2}$$
It was further developed by Vargha and Delaney (2000) especially in light of a Mann-Whitney U test.
For scale data, an approximation using the standard normal distribution is also available.
The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000)
The function is shown in this [YouTube video](https://youtu.be/4oAQNaqPmBc) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/CommonLanguageEffectSize.html)
Parameters
----------
catField : dataframe or list
the categorical data
scores : dataframe or list
the scores
categories : list, optional
to indicate which two categories of catField to use, otherwise first two found will be used.
levels : list or dictionary, optional
the scores in order
dmu : float, optional
difference according to null hypothesis (default is 0)
method : {"brute", "appr-mw", "appr-wh", "vda", "brute-it"} : optional
method to use. "brute" will use a brute force " that will split ties evenly, "brute-it" is the same as brute but ignores ties, "vda" will use the calculation from Vargha-Delany, and "appr" a normal approximation from McGraw-Wong
Returns
-------
A dataframe with:
* *CLE cat. 1*, the effect size for the first category
* *CLE cat. 2*, the effect size for the second category
Notes
------
For "brute" simply all possible pairs are determined and half of the ties are added, i.e. (Grissom, 1994, p. 282):
$$P(X > Y) + \\frac{P(X = Y)}{2}$$
With "brute-it" the ties are ignored (it = ignore ties):
$$P(X > Y)$$
The "appr-mw" uses the approximation from McGraw and Wong (1992, p. 361):
$$CL = \\Phi\\left(z\\right)$$
With:
$$z = \\frac{\\left|\\bar{x}_1 - \\bar{x}_2\\right|}{\\sqrt{s_1^2 + s_2^2}}$$
$$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$
$$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
While "appr-wh" uses the approximation from Wolfe and Hogg (1971, p. 28, eq. 2), which uses for the z calculation:
$$z = \\frac{\\left|\\bar{x}_1 - \\bar{x}_2\\right|}{\\sqrt{2}\\times s}$$
$$s = \\sqrt{\\frac{s_1^2\\times\\left(n_1 - 1\\right) + s_2^2\\times\\left(n_2 - 1\\right)}{n_1 + n_2}}$$
*Symbols used:*
* \\(x_{i,j}\\) the j-th score in category i
* \\(n_i\\) the number of scores in category i
* \\(\\Phi\\left(\\dots\\right)\\) the cumulative density function of the standard normal distribution
The "vda" uses the formula used from Vargha and Delaney (2000, p. 107):
$$A = \\frac{1}{n_j}\\times\\left(\\frac{R_i}{n_i} - \\frac{n_i + 1}{2}\\right)$$
*with*
* \\(R_i\\) the sum of the ranks in category i
It could also be calculated from the Mann-Whitney U value:
$$A = \\frac{U}{n_1\\times n_2}$$
Note that the difference between the two options (using category 1 or category 2) will be the deviation from 0.5. If all scores in the first category are lower than the scores in the second, A will be 0 using the first category, and 1 for the second.
If the number of scores in the first category higher than the second, is the same as the other way around, A (no matter which category used) will be 0.5.
The CLE can be converted to a Rank Biserial (= Cliff delta) using the **es_convert()** function. This can then be converted to a Cohen d, and then the rules-of-thumb for Cohen d could be used (**th_cohen_d()**)
The CLE for the other category is simply 1 - CLE, except for the case where ties are ignored ("brute-it").
Before, After and Alternatives
------------------------------
Before determining this effect size measure, you might want to run a test:
* [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
* [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
* [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
* [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
* [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test
After obtaining the coefficient you might want a rule-of-thumb:
* [th_cle()](../other/thumb_cle.html) for rules of thumb for the Common Language Effect Size
Alternative effect size measures could be:
* [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann Estimate
* [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for the Rank Biserial correlation coefficient
References
----------
Grissom, R. J. (1994). Statistical analysis of ordinal categorical status after therapies. *Journal of Consulting and Clinical Psychology, 62*(2), 281–284. doi:10.1037/0022-006X.62.2.281
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. *Psychological Bulletin, 111*(2), 361–365. doi:10.1037/0033-2909.111.2.361
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. *Journal of Educational and Behavioral Statistics, 25*(2), 101–132. doi:10.3102/10769986025002101
Wolfe, D. A., & Hogg, R. V. (1971). On constructing statistics and reporting data. *The American Statistician, 25*(4), 27–30. doi:10.1080/00031305.1971.10477278
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
#convert to pandas series if needed
if type(catField) is list:
catField = pd.Series(catField)
if type(scores) is list:
scores = pd.Series(scores)
#combine as one dataframe
df = pd.concat([catField, scores], axis=1)
df = df.dropna()
df.columns=['cat', 'score']
#replace the ordinal values if levels is provided
if levels is not None:
df['score'] = df['score'].map(levels).astype('Int8')
df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] )
else:
df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] )
#the two categories
if categories is not None:
cat1 = categories[0]
cat2 = categories[1]
else:
cat1 = df.iloc[:,0].value_counts().index[0]
cat2 = df.iloc[:,0].value_counts().index[1]
#seperate the scores for each category
x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
#make sure they are floats
x1 = [float(x) for x in x1]
x2 = [float(x) for x in x2]
n1 = len(x1)
n2 = len(x2)
n = n1 + n2
var1 = variance(x1)
var2 = variance(x2)
m1 = mean(x1)
m2 = mean(x2)
if method=="appr-mw":
z = (m1 - m2 - dmu)/(var1 + var2)**0.5
c1 = NormalDist().cdf(z)
c2 = 1 - c1
elif method=="appr-wh":
se = 2**0.5 * ((var1*(n1 - 1) + var2*(n2 - 1))/n)**0.5
z = (m1 - m2 - dmu)/se
c1 = NormalDist().cdf(z)
c2 = 1 - c1
elif method=="vda":
#combine this into one long list
allScores = x1 + x2
#get the ranks
allRanks = rankdata(allScores)
#get the ranks per category
cat1Ranks = allRanks[0:n1]
cat2Ranks = allRanks[n1:n]
r1 = sum(cat1Ranks)
r2 = sum(cat2Ranks)
c1 = 1 / n2 * (r1 / n1 - (n1 + 1) / 2)
c2 = 1 / n1 * (r2 / n2 - (n2 + 1) / 2)
elif method=="brute" or method=="brute-it":
difs = [i - j for i in x1 for j in x2]
# total number of pairs
n = len(difs)
xGTy = sum([i > 0 for i in difs])
if method=="brute":
#Counting ties half to each
# number of pairs with score in first category being equal to the second
xEQy = sum([i == 0 for i in difs])
# probability
c1 = xGTy/n + 1/2*(xEQy/n)
c2 = 1 - c1
else:
c1 = xGTy/n
c2 = sum([i < 0 for i in difs])/n
#the results
colnames = ["CLE " + cat1, "CLE " + cat2]
results = pd.DataFrame([[c1, c2]], columns=colnames)
return(results)
Functions
def es_common_language_is(catField, scores, categories=None, levels=None, dmu=0, method='brute')-
Common Language Effect Size (Independent Samples)
The Common Language Effect Size (a.k.a. Probability of Superiority) is the probability of taking a random pair from two categories, the first is greater than the first, i.e. P(X > Y)
Note however that Wolfe and Hogg (1971) actually had this in reverse, i.e.
P(X \leq Y)
Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which makes the definition:
P(X > Y) + \frac{P(X = Y)}{2}
It was further developed by Vargha and Delaney (2000) especially in light of a Mann-Whitney U test.
For scale data, an approximation using the standard normal distribution is also available.
The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000)
The function is shown in this YouTube video and the measure is also described at PeterStatistics.com
Parameters
catField:dataframeorlist- the categorical data
scores:dataframeorlist- the scores
categories:list, optional- to indicate which two categories of catField to use, otherwise first two found will be used.
levels:listordictionary, optional- the scores in order
dmu:float, optional- difference according to null hypothesis (default is 0)
method:{"brute", "appr-mw", "appr-wh", "vda", "brute-it"} : optional- method to use. "brute" will use a brute force " that will split ties evenly, "brute-it" is the same as brute but ignores ties, "vda" will use the calculation from Vargha-Delany, and "appr" a normal approximation from McGraw-Wong
Returns
A dataframe with:
- CLE cat. 1, the effect size for the first category
- CLE cat. 2, the effect size for the second category
Notes
For "brute" simply all possible pairs are determined and half of the ties are added, i.e. (Grissom, 1994, p. 282): P(X > Y) + \frac{P(X = Y)}{2}
With "brute-it" the ties are ignored (it = ignore ties): P(X > Y)
The "appr-mw" uses the approximation from McGraw and Wong (1992, p. 361): CL = \Phi\left(z\right)
With: z = \frac{\left|\bar{x}_1 - \bar{x}_2\right|}{\sqrt{s_1^2 + s_2^2}} s_i^2 = \frac{\sum_{j=1}^{n_i} \left(x_{i,j} - \bar{x}_i\right)^2}{n_i - 1} \bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}
While "appr-wh" uses the approximation from Wolfe and Hogg (1971, p. 28, eq. 2), which uses for the z calculation: z = \frac{\left|\bar{x}_1 - \bar{x}_2\right|}{\sqrt{2}\times s} s = \sqrt{\frac{s_1^2\times\left(n_1 - 1\right) + s_2^2\times\left(n_2 - 1\right)}{n_1 + n_2}}
Symbols used:
- x_{i,j} the j-th score in category i
- n_i the number of scores in category i
- \Phi\left(\dots\right) the cumulative density function of the standard normal distribution
The "vda" uses the formula used from Vargha and Delaney (2000, p. 107): A = \frac{1}{n_j}\times\left(\frac{R_i}{n_i} - \frac{n_i + 1}{2}\right)
with * R_i the sum of the ranks in category i
It could also be calculated from the Mann-Whitney U value: A = \frac{U}{n_1\times n_2}
Note that the difference between the two options (using category 1 or category 2) will be the deviation from 0.5. If all scores in the first category are lower than the scores in the second, A will be 0 using the first category, and 1 for the second.
If the number of scores in the first category higher than the second, is the same as the other way around, A (no matter which category used) will be 0.5.
The CLE can be converted to a Rank Biserial (= Cliff delta) using the es_convert() function. This can then be converted to a Cohen d, and then the rules-of-thumb for Cohen d could be used (th_cohen_d())
The CLE for the other category is simply 1 - CLE, except for the case where ties are ignored ("brute-it").
Before, After and Alternatives
Before determining this effect size measure, you might want to run a test:
- ts_mann_whitney for the Mann-Whitney U test
- ts_fligner_policello for the Fligner-Policello test
- ts_brunner_munzel for the Brunner-Munzel test
- ts_brunner_munzel_perm for the Brunner-Munzel Permutation test
- ts_c_square for the C^2 test
After obtaining the coefficient you might want a rule-of-thumb:
- th_cle() for rules of thumb for the Common Language Effect Size
Alternative effect size measures could be:
- me_hodges_lehmann_is for Hodges-Lehmann Estimate
- r_rank_biserial_is for the Rank Biserial correlation coefficient
References
Grissom, R. J. (1994). Statistical analysis of ordinal categorical status after therapies. Journal of Consulting and Clinical Psychology, 62(2), 281–284. doi:10.1037/0022-006X.62.2.281
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361–365. doi:10.1037/0033-2909.111.2.361
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101–132. doi:10.3102/10769986025002101
Wolfe, D. A., & Hogg, R. V. (1971). On constructing statistics and reporting data. The American Statistician, 25(4), 27–30. doi:10.1080/00031305.1971.10477278
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def es_common_language_is(catField, scores, categories=None, levels=None, dmu=0, method="brute"): ''' Common Language Effect Size (Independent Samples) ----------------------------------------------- The Common Language Effect Size (a.k.a. Probability of Superiority) is the probability of taking a random pair from two categories, the first is greater than the first, i.e. $$P(X > Y)$$ Note however that Wolfe and Hogg (1971) actually had this in reverse, i.e. $$P(X \\leq Y)$$ Some will also argue to count ties equally to each of the two categories (Grissom, 1994, p. 282), which makes the definition: $$P(X > Y) + \\frac{P(X = Y)}{2}$$ It was further developed by Vargha and Delaney (2000) especially in light of a Mann-Whitney U test. For scale data, an approximation using the standard normal distribution is also available. The term Common Language Effect Size can be found in McGraw and Wong (1992), the term Probability of Superiority is found in Grissom (1994), and the term Stochastic Superiority in Vargha and Delaney (2000) The function is shown in this [YouTube video](https://youtu.be/4oAQNaqPmBc) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/EffectSizes/CommonLanguageEffectSize.html) Parameters ---------- catField : dataframe or list the categorical data scores : dataframe or list the scores categories : list, optional to indicate which two categories of catField to use, otherwise first two found will be used. levels : list or dictionary, optional the scores in order dmu : float, optional difference according to null hypothesis (default is 0) method : {"brute", "appr-mw", "appr-wh", "vda", "brute-it"} : optional method to use. "brute" will use a brute force " that will split ties evenly, "brute-it" is the same as brute but ignores ties, "vda" will use the calculation from Vargha-Delany, and "appr" a normal approximation from McGraw-Wong Returns ------- A dataframe with: * *CLE cat. 1*, the effect size for the first category * *CLE cat. 2*, the effect size for the second category Notes ------ For "brute" simply all possible pairs are determined and half of the ties are added, i.e. (Grissom, 1994, p. 282): $$P(X > Y) + \\frac{P(X = Y)}{2}$$ With "brute-it" the ties are ignored (it = ignore ties): $$P(X > Y)$$ The "appr-mw" uses the approximation from McGraw and Wong (1992, p. 361): $$CL = \\Phi\\left(z\\right)$$ With: $$z = \\frac{\\left|\\bar{x}_1 - \\bar{x}_2\\right|}{\\sqrt{s_1^2 + s_2^2}}$$ $$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$ $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$ While "appr-wh" uses the approximation from Wolfe and Hogg (1971, p. 28, eq. 2), which uses for the z calculation: $$z = \\frac{\\left|\\bar{x}_1 - \\bar{x}_2\\right|}{\\sqrt{2}\\times s}$$ $$s = \\sqrt{\\frac{s_1^2\\times\\left(n_1 - 1\\right) + s_2^2\\times\\left(n_2 - 1\\right)}{n_1 + n_2}}$$ *Symbols used:* * \\(x_{i,j}\\) the j-th score in category i * \\(n_i\\) the number of scores in category i * \\(\\Phi\\left(\\dots\\right)\\) the cumulative density function of the standard normal distribution The "vda" uses the formula used from Vargha and Delaney (2000, p. 107): $$A = \\frac{1}{n_j}\\times\\left(\\frac{R_i}{n_i} - \\frac{n_i + 1}{2}\\right)$$ *with* * \\(R_i\\) the sum of the ranks in category i It could also be calculated from the Mann-Whitney U value: $$A = \\frac{U}{n_1\\times n_2}$$ Note that the difference between the two options (using category 1 or category 2) will be the deviation from 0.5. If all scores in the first category are lower than the scores in the second, A will be 0 using the first category, and 1 for the second. If the number of scores in the first category higher than the second, is the same as the other way around, A (no matter which category used) will be 0.5. The CLE can be converted to a Rank Biserial (= Cliff delta) using the **es_convert()** function. This can then be converted to a Cohen d, and then the rules-of-thumb for Cohen d could be used (**th_cohen_d()**) The CLE for the other category is simply 1 - CLE, except for the case where ties are ignored ("brute-it"). Before, After and Alternatives ------------------------------ Before determining this effect size measure, you might want to run a test: * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test After obtaining the coefficient you might want a rule-of-thumb: * [th_cle()](../other/thumb_cle.html) for rules of thumb for the Common Language Effect Size Alternative effect size measures could be: * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann Estimate * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for the Rank Biserial correlation coefficient References ---------- Grissom, R. J. (1994). Statistical analysis of ordinal categorical status after therapies. *Journal of Consulting and Clinical Psychology, 62*(2), 281–284. doi:10.1037/0022-006X.62.2.281 McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. *Psychological Bulletin, 111*(2), 361–365. doi:10.1037/0033-2909.111.2.361 Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. *Journal of Educational and Behavioral Statistics, 25*(2), 101–132. doi:10.3102/10769986025002101 Wolfe, D. A., & Hogg, R. V. (1971). On constructing statistics and reporting data. *The American Statistician, 25*(4), 27–30. doi:10.1080/00031305.1971.10477278 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' #convert to pandas series if needed if type(catField) is list: catField = pd.Series(catField) if type(scores) is list: scores = pd.Series(scores) #combine as one dataframe df = pd.concat([catField, scores], axis=1) df = df.dropna() df.columns=['cat', 'score'] #replace the ordinal values if levels is provided if levels is not None: df['score'] = df['score'].map(levels).astype('Int8') df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] ) else: df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] ) #the two categories if categories is not None: cat1 = categories[0] cat2 = categories[1] else: cat1 = df.iloc[:,0].value_counts().index[0] cat2 = df.iloc[:,0].value_counts().index[1] #seperate the scores for each category x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1]) x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2]) #make sure they are floats x1 = [float(x) for x in x1] x2 = [float(x) for x in x2] n1 = len(x1) n2 = len(x2) n = n1 + n2 var1 = variance(x1) var2 = variance(x2) m1 = mean(x1) m2 = mean(x2) if method=="appr-mw": z = (m1 - m2 - dmu)/(var1 + var2)**0.5 c1 = NormalDist().cdf(z) c2 = 1 - c1 elif method=="appr-wh": se = 2**0.5 * ((var1*(n1 - 1) + var2*(n2 - 1))/n)**0.5 z = (m1 - m2 - dmu)/se c1 = NormalDist().cdf(z) c2 = 1 - c1 elif method=="vda": #combine this into one long list allScores = x1 + x2 #get the ranks allRanks = rankdata(allScores) #get the ranks per category cat1Ranks = allRanks[0:n1] cat2Ranks = allRanks[n1:n] r1 = sum(cat1Ranks) r2 = sum(cat2Ranks) c1 = 1 / n2 * (r1 / n1 - (n1 + 1) / 2) c2 = 1 / n1 * (r2 / n2 - (n2 + 1) / 2) elif method=="brute" or method=="brute-it": difs = [i - j for i in x1 for j in x2] # total number of pairs n = len(difs) xGTy = sum([i > 0 for i in difs]) if method=="brute": #Counting ties half to each # number of pairs with score in first category being equal to the second xEQy = sum([i == 0 for i in difs]) # probability c1 = xGTy/n + 1/2*(xEQy/n) c2 = 1 - c1 else: c1 = xGTy/n c2 = sum([i < 0 for i in difs])/n #the results colnames = ["CLE " + cat1, "CLE " + cat2] results = pd.DataFrame([[c1, c2]], columns=colnames) return(results)