Module stikpetP.tests.test_cliff_delta_is
Expand source code
import pandas as pd
from scipy.stats import t
from statistics import NormalDist
def ts_cliff_delta_is(bin_field, ord_field, categories=None, levels=None, var_ver='unbiased', test_ver='cliff', round_df=False):
'''
Cliff Delta Test (independent samples)
--------------------------------------
The Cliff test, is a test for stochastic-equivelance. This means that even if the medians are equal between two independent samples, this test could be significant.
Lets say we have one group A that scored 1, 2, 2, 5, 6, 6, 7, and another group B that scored 4, 4, 4, 5, 10, 10, 12. Each group has the same median (i.e. 5), and are symmetric around the median, but if a high score is positive, most people would rather be in group B than in group A. This is where ‘stochastic equality’ comes in. It looks at the chance if you pick a random person from group A and B each, the one from group A scores lower than the one from group B, and add half the chance that their equal. In this example that’s about 0.68.
Cliff (1993) used a standard normal distribution, while Vargha and Delaney later used a t-distribution (Vargha, 2000; Vargha & Delaney, 2000, Delaney & Vargha, 2002).
This function is shown in this [YouTube video](https://youtu.be/q4tlBLStpeQ) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/CliffDeltaTest.html).
Parameters
----------
bin_field : pandas series
data with categories for the rows
ord_field : pandas series
data with the scores (ordinal field)
categories : list or dictionary, optional
the two categories to use from bin_field. If not set the first two found will be used
levels : list or dictionary, optional
the scores in order
var_ver : {"unbiased", "consistent"}, optional
version of estimated variance to use.
test_ver : {'cliff', 'fligner-policello', 'brunner-munzel', 'fp', 'bm'}, optional
version of the test to use.
round_df : bool, optional
round degrees of freedom
Returns
-------
A dataframe with:
* *var*, the estimated overall variance
* *Cliff delta*, the Cliff Delta value
* *test-statistic*, the test statistic
* *df*, the degrees of freedom
* *p-value*, the significance (p-value), two-tailed
* *categories*, the two categories that were used
* *var used*, the variance that was used
* *test used*, description of the test used.
Notes
-----
The test-statistic for this test is (Cliff, 1993, p. 500):
$$C = \\frac{d}{\\sqrt{\\hat{\\sigma}_d^2}}$$
Which follows a normal distribution (Cliff, 1993, p. 500), i.e.
$$p = 2 \\times \\left(1 - \\Phi\\left(\\left|C\\right|\\right)\\right)$$
With:
$$d = \\frac{\\sum_{i=1}^{n_1}\\sum_{j=1}^{n_2} d_{x_i, y_j}}{n_1 \\times n_2}$$
$$d_{x_i, y_j} = S_{i,j}$$
$$S_{i,j} = \\text{sign}\\left(x_i - y_j\\right) = \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases}$$
$$\\hat{\\sigma}_d^2 = \\max{\\left(s_d^2, s_{d, min}^2\\right)}$$
$$s_{d, min}^2 = \\frac{1 - d^2}{n_1\\times n_2 - 1}$$
$$s_d^2 = \\frac{n_2^2 \\times SS_{\\hat{d}_{1}} + n_1^2 \\times SS_{\\hat{d}_{2}} - SS_{d_{xy}}}{n_1\\times n_2 \\times \\left(n_1 - 1\\right) \\times \\left(n_2 - 1\\right)}$$
$$SS_{\\hat{d}_{1}} = \\sum_{i=1}^{n_1} \\left(\\hat{d}_{i1} - d\\right)^2, SS_{\\hat{d}_{2}} = \\sum_{j=1}^{n_2} \\left(\\hat{d}_{j2} - d\\right)^2, SS_{d_{xy}} = \\sum_{i=1}^{n_1}\\sum_{j=1}^{n_2} \\left(d_{x_i, y_j} - d\\right)^2$$
$$\\hat{d}_{i1} = \\frac{1}{n_2} \\times \\sum_{j=1}^{n_2} \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases} = \\frac{1}{n_2} \\times \\sum_{j=1}^{n_2} S_{i,j}$$
$$\\hat{d}_{j2} = \\frac{1}{n_1} \\times \\sum_{i=1}^{n_1} \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases} = \\frac{1}{n_1} \\times \\sum_{i=1}^{n_1} S_{i,j}$$
Alternatively, but with the same result, the sample variance of d, can be calculated with:
$$s_d^2 = \\frac{n_2}{n_1} \\times \\frac{s_{\\hat{d}_1}^2}{n_2 - 1} + \\frac{n_1}{n_2} \\times \\frac{s_{\\hat{d}_2}^2}{n_1 - 1} - \\frac{s_{d_{xy}}^2}{n_1 \\times n_2}$$
$$s_{\\hat{d}_1}^2 = \\frac{SS_{\\hat{d}_{1}}}{n_1 -1}, s_{\\hat{d}_2}^2 = \\frac{SS_{\\hat{d}_{2}}}{n_2 -1}, s_{d_{xy}}^2 = \\frac{SS_{d_{xy}}}{\\left(n_1 - 1\\right) \\times \\left(n_2 - 1\\right)}$$
A different estimate (a 'consistent') is given by (Cliff, 1993, p. 499, eq. 7):
$$\\hat{\\sigma}_d^2 = \\frac{\\left(n_2 - 1\\right) \\times s_{\\hat{d}_1}^2 + \\left(n_1 - 1\\right) \\times s_{\\hat{d}_2}^2 + s_{d_{xy}}^2}{n_1 \\times n_2}$$
Vargha (2000, p. 280) and also Vargha and Delaney (2000, p. 7, eq. 9) use a t-distribution, instead of the standard normal distribution. They use the same test-statistic, and as degrees of freedom they use:
$$p = 2 \\times \\left(1 - t\\left(\\left|C\\right|, df\\right)\\right)$$
$$df = \\frac{\\left(a + b\\right)^2}{\\frac{a^2}{n_1 - 1} + \\frac{b^2}{n_2 - 1}}$$
With:
$$a_{BM} = \\frac{1}{n_1} \\times \\frac{s_{R_1^*}^2}{n_2^2}, b_{BM} = \\frac{1}{n_2} \\times \\frac{s_{R_2^*}^2}{n_1^2}$$
$$s_{R_1^*}^2 = \\frac{SS_{R_1^*}}{n_1 -1}, s_{R_2^*}^2 = \\frac{SS_{R_2^*}}{n_2 -1}$$
$$SS_{R_1^*} = \\sum_{i=1}^{n_1} \\left(r_{i1}^* - \\bar{R}_1^*\\right)^2, SS_{R_2^*} = \\sum_{j=1}^{n_2} \\left(r_{j2}^* - \\bar{R}_2^*\\right)^2$$
$$\\bar{R}_1^* = \\frac{\\sum_{i=1}^{n_1} r_{i1}^*}{n_1}, \\bar{R}_2^* = \\frac{\\sum_{j=1}^{n_2} r_{j2}^*}{n_2}$$
$$r_{i,1}^* = \\sum_{j=1}^{n_2} \\begin{cases} 1 & s_{i,j} = 1 \\\\ 0.5 & s_{i,j} = 0 \\\\ 0 & s_{i,j} = -1 \\end{cases}$$
$$r_{i,2}^* = \\sum_{i=1}^{n_1} \\begin{cases} 1 & s_{i,j} = -1 \\\\ 0.5 & s_{i,j} = 0 \\\\ 0 & s_{i,j} = 1 \\end{cases}$$
This is in line with the Fligner-Policello test, and used when *test_used='fligner-policello*.
Delaney and Vargha (2002) proposed also an alternative degrees of freedom, in line with the Brunner-Munzel test, as:
$$a_{FP} = \\frac{s_{R_1^*}^2}{n_1}, b_{FP} = \\frac{s_{R_2^*}^2}{n_2}$$
*Symbols used:*
* \\(n_{k}\\), the number of scores in category k
* \\(\\Phi\\left(\\dots\\right)\\), the cumulative distribution function of the standard normal distribution
* \\(t\\left(\\dots\\right)\\), the cumulative distribution function of the t distribution
A few additional notes.
In Vargha and Delaney (2000, p. 7, eq. 9) they mention "df is rounded to the nearest integer", which is why this function allows for a rounding of the degrees of freedom (df).
The \\(r^*\\) values, are so-called placement values. These can also be calculated by subtracting the within-mid-rank from the pooled mid-rank.
The \\(d\\) value is known as Cliff Delta, and is also the same as the (Glass) rank biserial correlation coefficient. This in itself is sometimes used as an effect size measure, and various methods are available to calculate it.
For the \\(\\hat{d}_{j2}\\) Cliff (1993) mentions: " \\( d_{.j} \\) represents the proportion of scores from the first population that lies **above** a given score from the second, minus the reverse " (p. 499), which is the one used here. However, Delaney and Vargha (2002) wrote: " \\(d_{.j}\\) denotes the proportion of the \\(X\\) scores that lie **below** \\(Y_j\\) minus the proportion that lie above" (p. 9), but this is most likely a mistake.
Before, After and Alternatives
------------------------------
Before running the test you might first want to get an impression using a cross table and/or some visualisation:
* [tab_cross](../other/table_cross.html#tab_cross)
* [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)
After the test you might want an effect size measure:
* [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
* [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
* [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be:
* [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
* [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
* [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
* [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
* [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\\\(C^2\\\\) test
References
----------
Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. *Psychological Bulletin, 114*(3), 494–509. doi:10.1037/0033-2909.114.3.494
Delaney, H. D., & Vargha, A. (2002). Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples. *Psychological Methods, 7*(4), 485–503. doi:10.1037/1082-989X.7.4.485
Vargha A. (2000). Két pszichológiai populáció sztochasztikus egyenlőségének ellenőrzésére alkalmas statisztikai próbák összehasonlító vizsgálata. *Magyar Pszichológiai Szemle, 55*(2–3), Article 2–3. doi:10/1/mpszle.55.2000.2-3.5.pdf
Vargha, A., & Delaney, H. D. (2000). Comparing several robust tests of stochastic equality. https://eric.ed.gov/?id=ED441836
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
# DATA PREPARATION
#convert to pandas series if needed
if type(bin_field) is list:
bin_field = pd.Series(bin_field)
if type(ord_field) is list:
ord_field = pd.Series(ord_field)
#combine as one dataframe
dfr = pd.concat([bin_field, ord_field], axis=1)
dfr = dfr.dropna()
dfr.columns=['cat', 'score']
#replace the ordinal values if levels is provided
if levels is not None:
dfr['score'] = dfr['score'].map(levels).astype('Int8')
else:
dfr.iloc[:,1] = pd.to_numeric(dfr.iloc[:,1] )
#the two categories
if categories is not None:
cat1 = categories[0]
cat2 = categories[1]
else:
cat1 = dfr.iloc[:,0].value_counts().index[0]
cat2 = dfr.iloc[:,0].value_counts().index[1]
cats = cat1 + ', ' + cat2
# remove scores not in either of the two categories
dfr = dfr[dfr['cat'].isin([cat1, cat2])]
#seperate the scores for each category
X = list(dfr.iloc[:,1][dfr.iloc[:,0] == cat1])
Y = list(dfr.iloc[:,1][dfr.iloc[:,0] == cat2])
n1 = len(X)
n2 = len(Y)
N = n1 + n2
X = pd.Series(X)
Y = pd.Series(Y)
# Create the difference matrix: each row from X minus each column from Y
diff = X.values[:, None] - Y.values[None, :]
# Create DataFrame from the difference matrix
df_diff = pd.DataFrame(diff, index=X.index, columns=Y.index)
# Apply sign operation without defining a function
S = (df_diff > 0).astype(int) - (df_diff < 0).astype(int)
di1 = S.sum(axis=1)/n2
dj2 = S.sum(axis=0)/n1
d = di1.mean()
SSd1 = di1.var(ddof=0) * n1
SSd2 = dj2.var(ddof=0) * n2
SSd = S.values.flatten().var(ddof=0)*(n1*n2)
if var_ver=='unbiased':
var_used = 'unbiased'
var_d_v1 = (n2**2 * SSd1 + n1**2 * SSd2 - SSd) / (n1*n2*(n1 - 1)*(n2 - 1))
var_d_min = (1 - d**2)/(n1*n2 - 1)
var = max(var_d_v1, var_d_min)
elif var_ver=='consistent':
var_used = 'consistent'
var_d1 = SSd1 / (n1 - 1)
var_d2 = SSd2 / (n2 - 1)
var_d = SSd / ((n1 - 1)*(n2 - 1))
var = ((n2 - 1)*var_d1 + (n1 - 1)*var_d2 + var_d)/(n1*n2)
C = d/(var**0.5)
if test_ver=='cliff':
df = 'n.a.'
test_used = 'Cliff Delta test with standard normal distribution'
p = 2 * (1 - NormalDist().cdf(abs(C)))
else:
r_ast_1 = (S == 1).sum(axis=1) + 1/2*(S == 0).sum(axis=1)
r_ast_2 = (S == -1).sum(axis=0) + 1/2*(S == 0).sum(axis=0)
var_R_ast_1 = r_ast_1.var(ddof=1)
var_R_ast_2 = r_ast_2.var(ddof=1)
if test_ver=='fligner-policello' or test_ver=='fp':
test_used = 'Cliff Delta test with t distribution, and Fligner-Policello df'
a = var_R_ast_1/n1
b = var_R_ast_2/n2
elif test_ver=='brunner-munzel' or test_ver=='bm':
test_used = 'Cliff Delta test with t distribution, and Brunner-Munzel df'
a = 1/n1*var_R_ast_1/n2**2
b = 1/n2*var_R_ast_2/n1**2
# degrees of freedom
df = (a + b)**2 /(a**2/(n1 - 1) + b**2/(n2 - 1))
if round_df:
df = round(df,0)
p = 2*(1 - t.cdf(abs(C), df))
results = {'var':var, 'Cliff delta':d, 'test statistic':C, 'df':df, 'p-value':p, 'categories':cats, 'var used':var_used, 'test used':test_used}
results_pd = pd.DataFrame([results])
return results_pd
Functions
def ts_cliff_delta_is(bin_field, ord_field, categories=None, levels=None, var_ver='unbiased', test_ver='cliff', round_df=False)-
Cliff Delta Test (independent samples)
The Cliff test, is a test for stochastic-equivelance. This means that even if the medians are equal between two independent samples, this test could be significant.
Lets say we have one group A that scored 1, 2, 2, 5, 6, 6, 7, and another group B that scored 4, 4, 4, 5, 10, 10, 12. Each group has the same median (i.e. 5), and are symmetric around the median, but if a high score is positive, most people would rather be in group B than in group A. This is where ‘stochastic equality’ comes in. It looks at the chance if you pick a random person from group A and B each, the one from group A scores lower than the one from group B, and add half the chance that their equal. In this example that’s about 0.68.
Cliff (1993) used a standard normal distribution, while Vargha and Delaney later used a t-distribution (Vargha, 2000; Vargha & Delaney, 2000, Delaney & Vargha, 2002).
This function is shown in this YouTube video and the test is also described at PeterStatistics.com.
Parameters
bin_field:pandas series- data with categories for the rows
ord_field:pandas series- data with the scores (ordinal field)
categories:listordictionary, optional- the two categories to use from bin_field. If not set the first two found will be used
levels:listordictionary, optional- the scores in order
var_ver:{"unbiased", "consistent"}, optional- version of estimated variance to use.
test_ver:{'cliff', 'fligner-policello', 'brunner-munzel', 'fp', 'bm'}, optional- version of the test to use.
round_df:bool, optional- round degrees of freedom
Returns
A dataframe with:
- var, the estimated overall variance
- Cliff delta, the Cliff Delta value
- test-statistic, the test statistic
- df, the degrees of freedom
- p-value, the significance (p-value), two-tailed
- categories, the two categories that were used
- var used, the variance that was used
- test used, description of the test used.
Notes
The test-statistic for this test is (Cliff, 1993, p. 500): C = \frac{d}{\sqrt{\hat{\sigma}_d^2}}
Which follows a normal distribution (Cliff, 1993, p. 500), i.e. p = 2 \times \left(1 - \Phi\left(\left|C\right|\right)\right)
With: d = \frac{\sum_{i=1}^{n_1}\sum_{j=1}^{n_2} d_{x_i, y_j}}{n_1 \times n_2} d_{x_i, y_j} = S_{i,j} S_{i,j} = \text{sign}\left(x_i - y_j\right) = \begin{cases} 1 & x_i \gt y_j \\ 0 & x_i = y_j \\ -1 & x_i \lt y_j \end{cases} \hat{\sigma}_d^2 = \max{\left(s_d^2, s_{d, min}^2\right)} s_{d, min}^2 = \frac{1 - d^2}{n_1\times n_2 - 1} s_d^2 = \frac{n_2^2 \times SS_{\hat{d}_{1}} + n_1^2 \times SS_{\hat{d}_{2}} - SS_{d_{xy}}}{n_1\times n_2 \times \left(n_1 - 1\right) \times \left(n_2 - 1\right)} SS_{\hat{d}_{1}} = \sum_{i=1}^{n_1} \left(\hat{d}_{i1} - d\right)^2, SS_{\hat{d}_{2}} = \sum_{j=1}^{n_2} \left(\hat{d}_{j2} - d\right)^2, SS_{d_{xy}} = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2} \left(d_{x_i, y_j} - d\right)^2 \hat{d}_{i1} = \frac{1}{n_2} \times \sum_{j=1}^{n_2} \begin{cases} 1 & x_i \gt y_j \\ 0 & x_i = y_j \\ -1 & x_i \lt y_j \end{cases} = \frac{1}{n_2} \times \sum_{j=1}^{n_2} S_{i,j} \hat{d}_{j2} = \frac{1}{n_1} \times \sum_{i=1}^{n_1} \begin{cases} 1 & x_i \gt y_j \\ 0 & x_i = y_j \\ -1 & x_i \lt y_j \end{cases} = \frac{1}{n_1} \times \sum_{i=1}^{n_1} S_{i,j}
Alternatively, but with the same result, the sample variance of d, can be calculated with: s_d^2 = \frac{n_2}{n_1} \times \frac{s_{\hat{d}_1}^2}{n_2 - 1} + \frac{n_1}{n_2} \times \frac{s_{\hat{d}_2}^2}{n_1 - 1} - \frac{s_{d_{xy}}^2}{n_1 \times n_2} s_{\hat{d}_1}^2 = \frac{SS_{\hat{d}_{1}}}{n_1 -1}, s_{\hat{d}_2}^2 = \frac{SS_{\hat{d}_{2}}}{n_2 -1}, s_{d_{xy}}^2 = \frac{SS_{d_{xy}}}{\left(n_1 - 1\right) \times \left(n_2 - 1\right)}
A different estimate (a 'consistent') is given by (Cliff, 1993, p. 499, eq. 7): \hat{\sigma}_d^2 = \frac{\left(n_2 - 1\right) \times s_{\hat{d}_1}^2 + \left(n_1 - 1\right) \times s_{\hat{d}_2}^2 + s_{d_{xy}}^2}{n_1 \times n_2}
Vargha (2000, p. 280) and also Vargha and Delaney (2000, p. 7, eq. 9) use a t-distribution, instead of the standard normal distribution. They use the same test-statistic, and as degrees of freedom they use: p = 2 \times \left(1 - t\left(\left|C\right|, df\right)\right) df = \frac{\left(a + b\right)^2}{\frac{a^2}{n_1 - 1} + \frac{b^2}{n_2 - 1}}
With: a_{BM} = \frac{1}{n_1} \times \frac{s_{R_1^*}^2}{n_2^2}, b_{BM} = \frac{1}{n_2} \times \frac{s_{R_2^*}^2}{n_1^2} s_{R_1^*}^2 = \frac{SS_{R_1^*}}{n_1 -1}, s_{R_2^*}^2 = \frac{SS_{R_2^*}}{n_2 -1} SS_{R_1^*} = \sum_{i=1}^{n_1} \left(r_{i1}^* - \bar{R}_1^*\right)^2, SS_{R_2^*} = \sum_{j=1}^{n_2} \left(r_{j2}^* - \bar{R}_2^*\right)^2 \bar{R}_1^* = \frac{\sum_{i=1}^{n_1} r_{i1}^*}{n_1}, \bar{R}_2^* = \frac{\sum_{j=1}^{n_2} r_{j2}^*}{n_2} r_{i,1}^* = \sum_{j=1}^{n_2} \begin{cases} 1 & s_{i,j} = 1 \\ 0.5 & s_{i,j} = 0 \\ 0 & s_{i,j} = -1 \end{cases} r_{i,2}^* = \sum_{i=1}^{n_1} \begin{cases} 1 & s_{i,j} = -1 \\ 0.5 & s_{i,j} = 0 \\ 0 & s_{i,j} = 1 \end{cases}
This is in line with the Fligner-Policello test, and used when test_used='fligner-policello.
Delaney and Vargha (2002) proposed also an alternative degrees of freedom, in line with the Brunner-Munzel test, as: a_{FP} = \frac{s_{R_1^*}^2}{n_1}, b_{FP} = \frac{s_{R_2^*}^2}{n_2}
Symbols used:
- n_{k}, the number of scores in category k
- \Phi\left(\dots\right), the cumulative distribution function of the standard normal distribution
- t\left(\dots\right), the cumulative distribution function of the t distribution
A few additional notes.
In Vargha and Delaney (2000, p. 7, eq. 9) they mention "df is rounded to the nearest integer", which is why this function allows for a rounding of the degrees of freedom (df).
The r^* values, are so-called placement values. These can also be calculated by subtracting the within-mid-rank from the pooled mid-rank.
The d value is known as Cliff Delta, and is also the same as the (Glass) rank biserial correlation coefficient. This in itself is sometimes used as an effect size measure, and various methods are available to calculate it.
For the \hat{d}_{j2} Cliff (1993) mentions: " d_{.j} represents the proportion of scores from the first population that lies above a given score from the second, minus the reverse " (p. 499), which is the one used here. However, Delaney and Vargha (2002) wrote: " d_{.j} denotes the proportion of the X scores that lie below Y_j minus the proportion that lie above" (p. 9), but this is most likely a mistake.
Before, After and Alternatives
Before running the test you might first want to get an impression using a cross table and/or some visualisation:
After the test you might want an effect size measure:
- es_common_language_is for Common Language Effect Size
- me_hodges_lehmann_is for Hodges-Lehmann estimate
- r_rank_biserial_is for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be:
- ts_mann_whitney for the Mann-Whitney U test
- ts_fligner_policello for the Fligner-Policello test
- ts_brunner_munzel for the Brunner-Munzel test
- ts_brunner_munzel_perm for the Brunner-Munzel Permutation test
- ts_c_square for the \(C^2\) test
References
Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114(3), 494–509. doi:10.1037/0033-2909.114.3.494
Delaney, H. D., & Vargha, A. (2002). Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples. Psychological Methods, 7(4), 485–503. doi:10.1037/1082-989X.7.4.485
Vargha A. (2000). Két pszichológiai populáció sztochasztikus egyenlőségének ellenőrzésére alkalmas statisztikai próbák összehasonlító vizsgálata. Magyar Pszichológiai Szemle, 55(2–3), Article 2–3. doi:10/1/mpszle.55.2000.2-3.5.pdf
Vargha, A., & Delaney, H. D. (2000). Comparing several robust tests of stochastic equality. https://eric.ed.gov/?id=ED441836
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_cliff_delta_is(bin_field, ord_field, categories=None, levels=None, var_ver='unbiased', test_ver='cliff', round_df=False): ''' Cliff Delta Test (independent samples) -------------------------------------- The Cliff test, is a test for stochastic-equivelance. This means that even if the medians are equal between two independent samples, this test could be significant. Lets say we have one group A that scored 1, 2, 2, 5, 6, 6, 7, and another group B that scored 4, 4, 4, 5, 10, 10, 12. Each group has the same median (i.e. 5), and are symmetric around the median, but if a high score is positive, most people would rather be in group B than in group A. This is where ‘stochastic equality’ comes in. It looks at the chance if you pick a random person from group A and B each, the one from group A scores lower than the one from group B, and add half the chance that their equal. In this example that’s about 0.68. Cliff (1993) used a standard normal distribution, while Vargha and Delaney later used a t-distribution (Vargha, 2000; Vargha & Delaney, 2000, Delaney & Vargha, 2002). This function is shown in this [YouTube video](https://youtu.be/q4tlBLStpeQ) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/CliffDeltaTest.html). Parameters ---------- bin_field : pandas series data with categories for the rows ord_field : pandas series data with the scores (ordinal field) categories : list or dictionary, optional the two categories to use from bin_field. If not set the first two found will be used levels : list or dictionary, optional the scores in order var_ver : {"unbiased", "consistent"}, optional version of estimated variance to use. test_ver : {'cliff', 'fligner-policello', 'brunner-munzel', 'fp', 'bm'}, optional version of the test to use. round_df : bool, optional round degrees of freedom Returns ------- A dataframe with: * *var*, the estimated overall variance * *Cliff delta*, the Cliff Delta value * *test-statistic*, the test statistic * *df*, the degrees of freedom * *p-value*, the significance (p-value), two-tailed * *categories*, the two categories that were used * *var used*, the variance that was used * *test used*, description of the test used. Notes ----- The test-statistic for this test is (Cliff, 1993, p. 500): $$C = \\frac{d}{\\sqrt{\\hat{\\sigma}_d^2}}$$ Which follows a normal distribution (Cliff, 1993, p. 500), i.e. $$p = 2 \\times \\left(1 - \\Phi\\left(\\left|C\\right|\\right)\\right)$$ With: $$d = \\frac{\\sum_{i=1}^{n_1}\\sum_{j=1}^{n_2} d_{x_i, y_j}}{n_1 \\times n_2}$$ $$d_{x_i, y_j} = S_{i,j}$$ $$S_{i,j} = \\text{sign}\\left(x_i - y_j\\right) = \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases}$$ $$\\hat{\\sigma}_d^2 = \\max{\\left(s_d^2, s_{d, min}^2\\right)}$$ $$s_{d, min}^2 = \\frac{1 - d^2}{n_1\\times n_2 - 1}$$ $$s_d^2 = \\frac{n_2^2 \\times SS_{\\hat{d}_{1}} + n_1^2 \\times SS_{\\hat{d}_{2}} - SS_{d_{xy}}}{n_1\\times n_2 \\times \\left(n_1 - 1\\right) \\times \\left(n_2 - 1\\right)}$$ $$SS_{\\hat{d}_{1}} = \\sum_{i=1}^{n_1} \\left(\\hat{d}_{i1} - d\\right)^2, SS_{\\hat{d}_{2}} = \\sum_{j=1}^{n_2} \\left(\\hat{d}_{j2} - d\\right)^2, SS_{d_{xy}} = \\sum_{i=1}^{n_1}\\sum_{j=1}^{n_2} \\left(d_{x_i, y_j} - d\\right)^2$$ $$\\hat{d}_{i1} = \\frac{1}{n_2} \\times \\sum_{j=1}^{n_2} \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases} = \\frac{1}{n_2} \\times \\sum_{j=1}^{n_2} S_{i,j}$$ $$\\hat{d}_{j2} = \\frac{1}{n_1} \\times \\sum_{i=1}^{n_1} \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases} = \\frac{1}{n_1} \\times \\sum_{i=1}^{n_1} S_{i,j}$$ Alternatively, but with the same result, the sample variance of d, can be calculated with: $$s_d^2 = \\frac{n_2}{n_1} \\times \\frac{s_{\\hat{d}_1}^2}{n_2 - 1} + \\frac{n_1}{n_2} \\times \\frac{s_{\\hat{d}_2}^2}{n_1 - 1} - \\frac{s_{d_{xy}}^2}{n_1 \\times n_2}$$ $$s_{\\hat{d}_1}^2 = \\frac{SS_{\\hat{d}_{1}}}{n_1 -1}, s_{\\hat{d}_2}^2 = \\frac{SS_{\\hat{d}_{2}}}{n_2 -1}, s_{d_{xy}}^2 = \\frac{SS_{d_{xy}}}{\\left(n_1 - 1\\right) \\times \\left(n_2 - 1\\right)}$$ A different estimate (a 'consistent') is given by (Cliff, 1993, p. 499, eq. 7): $$\\hat{\\sigma}_d^2 = \\frac{\\left(n_2 - 1\\right) \\times s_{\\hat{d}_1}^2 + \\left(n_1 - 1\\right) \\times s_{\\hat{d}_2}^2 + s_{d_{xy}}^2}{n_1 \\times n_2}$$ Vargha (2000, p. 280) and also Vargha and Delaney (2000, p. 7, eq. 9) use a t-distribution, instead of the standard normal distribution. They use the same test-statistic, and as degrees of freedom they use: $$p = 2 \\times \\left(1 - t\\left(\\left|C\\right|, df\\right)\\right)$$ $$df = \\frac{\\left(a + b\\right)^2}{\\frac{a^2}{n_1 - 1} + \\frac{b^2}{n_2 - 1}}$$ With: $$a_{BM} = \\frac{1}{n_1} \\times \\frac{s_{R_1^*}^2}{n_2^2}, b_{BM} = \\frac{1}{n_2} \\times \\frac{s_{R_2^*}^2}{n_1^2}$$ $$s_{R_1^*}^2 = \\frac{SS_{R_1^*}}{n_1 -1}, s_{R_2^*}^2 = \\frac{SS_{R_2^*}}{n_2 -1}$$ $$SS_{R_1^*} = \\sum_{i=1}^{n_1} \\left(r_{i1}^* - \\bar{R}_1^*\\right)^2, SS_{R_2^*} = \\sum_{j=1}^{n_2} \\left(r_{j2}^* - \\bar{R}_2^*\\right)^2$$ $$\\bar{R}_1^* = \\frac{\\sum_{i=1}^{n_1} r_{i1}^*}{n_1}, \\bar{R}_2^* = \\frac{\\sum_{j=1}^{n_2} r_{j2}^*}{n_2}$$ $$r_{i,1}^* = \\sum_{j=1}^{n_2} \\begin{cases} 1 & s_{i,j} = 1 \\\\ 0.5 & s_{i,j} = 0 \\\\ 0 & s_{i,j} = -1 \\end{cases}$$ $$r_{i,2}^* = \\sum_{i=1}^{n_1} \\begin{cases} 1 & s_{i,j} = -1 \\\\ 0.5 & s_{i,j} = 0 \\\\ 0 & s_{i,j} = 1 \\end{cases}$$ This is in line with the Fligner-Policello test, and used when *test_used='fligner-policello*. Delaney and Vargha (2002) proposed also an alternative degrees of freedom, in line with the Brunner-Munzel test, as: $$a_{FP} = \\frac{s_{R_1^*}^2}{n_1}, b_{FP} = \\frac{s_{R_2^*}^2}{n_2}$$ *Symbols used:* * \\(n_{k}\\), the number of scores in category k * \\(\\Phi\\left(\\dots\\right)\\), the cumulative distribution function of the standard normal distribution * \\(t\\left(\\dots\\right)\\), the cumulative distribution function of the t distribution A few additional notes. In Vargha and Delaney (2000, p. 7, eq. 9) they mention "df is rounded to the nearest integer", which is why this function allows for a rounding of the degrees of freedom (df). The \\(r^*\\) values, are so-called placement values. These can also be calculated by subtracting the within-mid-rank from the pooled mid-rank. The \\(d\\) value is known as Cliff Delta, and is also the same as the (Glass) rank biserial correlation coefficient. This in itself is sometimes used as an effect size measure, and various methods are available to calculate it. For the \\(\\hat{d}_{j2}\\) Cliff (1993) mentions: " \\( d_{.j} \\) represents the proportion of scores from the first population that lies **above** a given score from the second, minus the reverse " (p. 499), which is the one used here. However, Delaney and Vargha (2002) wrote: " \\(d_{.j}\\) denotes the proportion of the \\(X\\) scores that lie **below** \\(Y_j\\) minus the proportion that lie above" (p. 9), but this is most likely a mistake. Before, After and Alternatives ------------------------------ Before running the test you might first want to get an impression using a cross table and/or some visualisation: * [tab_cross](../other/table_cross.html#tab_cross) * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple) After the test you might want an effect size measure: * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta) Alternatives for this test could be: * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\\\(C^2\\\\) test References ---------- Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. *Psychological Bulletin, 114*(3), 494–509. doi:10.1037/0033-2909.114.3.494 Delaney, H. D., & Vargha, A. (2002). Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples. *Psychological Methods, 7*(4), 485–503. doi:10.1037/1082-989X.7.4.485 Vargha A. (2000). Két pszichológiai populáció sztochasztikus egyenlőségének ellenőrzésére alkalmas statisztikai próbák összehasonlító vizsgálata. *Magyar Pszichológiai Szemle, 55*(2–3), Article 2–3. doi:10/1/mpszle.55.2000.2-3.5.pdf Vargha, A., & Delaney, H. D. (2000). Comparing several robust tests of stochastic equality. https://eric.ed.gov/?id=ED441836 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' # DATA PREPARATION #convert to pandas series if needed if type(bin_field) is list: bin_field = pd.Series(bin_field) if type(ord_field) is list: ord_field = pd.Series(ord_field) #combine as one dataframe dfr = pd.concat([bin_field, ord_field], axis=1) dfr = dfr.dropna() dfr.columns=['cat', 'score'] #replace the ordinal values if levels is provided if levels is not None: dfr['score'] = dfr['score'].map(levels).astype('Int8') else: dfr.iloc[:,1] = pd.to_numeric(dfr.iloc[:,1] ) #the two categories if categories is not None: cat1 = categories[0] cat2 = categories[1] else: cat1 = dfr.iloc[:,0].value_counts().index[0] cat2 = dfr.iloc[:,0].value_counts().index[1] cats = cat1 + ', ' + cat2 # remove scores not in either of the two categories dfr = dfr[dfr['cat'].isin([cat1, cat2])] #seperate the scores for each category X = list(dfr.iloc[:,1][dfr.iloc[:,0] == cat1]) Y = list(dfr.iloc[:,1][dfr.iloc[:,0] == cat2]) n1 = len(X) n2 = len(Y) N = n1 + n2 X = pd.Series(X) Y = pd.Series(Y) # Create the difference matrix: each row from X minus each column from Y diff = X.values[:, None] - Y.values[None, :] # Create DataFrame from the difference matrix df_diff = pd.DataFrame(diff, index=X.index, columns=Y.index) # Apply sign operation without defining a function S = (df_diff > 0).astype(int) - (df_diff < 0).astype(int) di1 = S.sum(axis=1)/n2 dj2 = S.sum(axis=0)/n1 d = di1.mean() SSd1 = di1.var(ddof=0) * n1 SSd2 = dj2.var(ddof=0) * n2 SSd = S.values.flatten().var(ddof=0)*(n1*n2) if var_ver=='unbiased': var_used = 'unbiased' var_d_v1 = (n2**2 * SSd1 + n1**2 * SSd2 - SSd) / (n1*n2*(n1 - 1)*(n2 - 1)) var_d_min = (1 - d**2)/(n1*n2 - 1) var = max(var_d_v1, var_d_min) elif var_ver=='consistent': var_used = 'consistent' var_d1 = SSd1 / (n1 - 1) var_d2 = SSd2 / (n2 - 1) var_d = SSd / ((n1 - 1)*(n2 - 1)) var = ((n2 - 1)*var_d1 + (n1 - 1)*var_d2 + var_d)/(n1*n2) C = d/(var**0.5) if test_ver=='cliff': df = 'n.a.' test_used = 'Cliff Delta test with standard normal distribution' p = 2 * (1 - NormalDist().cdf(abs(C))) else: r_ast_1 = (S == 1).sum(axis=1) + 1/2*(S == 0).sum(axis=1) r_ast_2 = (S == -1).sum(axis=0) + 1/2*(S == 0).sum(axis=0) var_R_ast_1 = r_ast_1.var(ddof=1) var_R_ast_2 = r_ast_2.var(ddof=1) if test_ver=='fligner-policello' or test_ver=='fp': test_used = 'Cliff Delta test with t distribution, and Fligner-Policello df' a = var_R_ast_1/n1 b = var_R_ast_2/n2 elif test_ver=='brunner-munzel' or test_ver=='bm': test_used = 'Cliff Delta test with t distribution, and Brunner-Munzel df' a = 1/n1*var_R_ast_1/n2**2 b = 1/n2*var_R_ast_2/n1**2 # degrees of freedom df = (a + b)**2 /(a**2/(n1 - 1) + b**2/(n2 - 1)) if round_df: df = round(df,0) p = 2*(1 - t.cdf(abs(C), df)) results = {'var':var, 'Cliff delta':d, 'test statistic':C, 'df':df, 'p-value':p, 'categories':cats, 'var used':var_used, 'test used':test_used} results_pd = pd.DataFrame([results]) return results_pd