Module `stikpetP.tests.test_cliff_delta_is`

Expand source code

import pandas as pd
from scipy.stats import t
from statistics import NormalDist

def ts_cliff_delta_is(bin_field, ord_field, categories=None, levels=None, var_ver='unbiased', test_ver='cliff', round_df=False):
    '''
    Cliff Delta Test (independent samples)
    --------------------------------------
    The Cliff test, is a test for stochastic-equivelance. This means that even if the medians are equal between two independent samples, this test could be significant.
    
    Lets say we have one group A that scored 1, 2, 2, 5, 6, 6, 7, and another group B that scored 4, 4, 4, 5, 10, 10, 12. Each group has the same median (i.e. 5), and are symmetric around the median, but if a high score is positive, most people would rather be in group B than in group A. This is where ‘stochastic equality’ comes in. It looks at the chance if you pick a random person from group A and B each, the one from group A scores lower than the one from group B, and add half the chance that their equal. In this example that’s about 0.68.
    
    Cliff (1993) used a standard normal distribution, while Vargha and Delaney later used a t-distribution (Vargha, 2000; Vargha & Delaney, 2000, Delaney & Vargha, 2002).

    This function is shown in this [YouTube video](https://youtu.be/q4tlBLStpeQ) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/CliffDeltaTest.html).

    Parameters
    ----------
    bin_field : pandas series
        data with categories for the rows
    ord_field : pandas series
        data with the scores (ordinal field)
    categories : list or dictionary, optional
        the two categories to use from bin_field. If not set the first two found will be used
    levels : list or dictionary, optional
        the scores in order
    var_ver : {"unbiased", "consistent"}, optional
        version of estimated variance to use.
    test_ver : {'cliff', 'fligner-policello', 'brunner-munzel', 'fp', 'bm'}, optional
        version of the test to use.
    round_df : bool, optional
        round degrees of freedom

    Returns
    -------
    A dataframe with:
    
    * *var*, the estimated overall variance
    * *Cliff delta*, the Cliff Delta value
    * *test-statistic*, the test statistic
    * *df*, the degrees of freedom
    * *p-value*, the significance (p-value), two-tailed
    * *categories*, the two categories that were used
    * *var used*, the variance that was used
    * *test used*, description of the test used.
    
    Notes
    -----
    The test-statistic for this test is (Cliff, 1993, p. 500):
    $$C = \\frac{d}{\\sqrt{\\hat{\\sigma}_d^2}}$$

    Which follows a normal distribution (Cliff, 1993, p. 500), i.e. 
    $$p = 2 \\times \\left(1 - \\Phi\\left(\\left|C\\right|\\right)\\right)$$

    With:
    $$d = \\frac{\\sum_{i=1}^{n_1}\\sum_{j=1}^{n_2} d_{x_i, y_j}}{n_1 \\times n_2}$$
    $$d_{x_i, y_j} = S_{i,j}$$
    $$S_{i,j} = \\text{sign}\\left(x_i - y_j\\right) = \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases}$$
    $$\\hat{\\sigma}_d^2 = \\max{\\left(s_d^2, s_{d, min}^2\\right)}$$
    $$s_{d, min}^2 = \\frac{1 - d^2}{n_1\\times n_2 - 1}$$
    $$s_d^2 = \\frac{n_2^2 \\times SS_{\\hat{d}_{1}} + n_1^2 \\times SS_{\\hat{d}_{2}} - SS_{d_{xy}}}{n_1\\times n_2 \\times \\left(n_1 - 1\\right) \\times \\left(n_2 - 1\\right)}$$
    $$SS_{\\hat{d}_{1}} = \\sum_{i=1}^{n_1} \\left(\\hat{d}_{i1} - d\\right)^2, SS_{\\hat{d}_{2}} = \\sum_{j=1}^{n_2} \\left(\\hat{d}_{j2} - d\\right)^2, SS_{d_{xy}} = \\sum_{i=1}^{n_1}\\sum_{j=1}^{n_2} \\left(d_{x_i, y_j} - d\\right)^2$$
    $$\\hat{d}_{i1} = \\frac{1}{n_2} \\times \\sum_{j=1}^{n_2} \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases} = \\frac{1}{n_2} \\times \\sum_{j=1}^{n_2} S_{i,j}$$
    $$\\hat{d}_{j2} = \\frac{1}{n_1} \\times \\sum_{i=1}^{n_1} \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases} = \\frac{1}{n_1} \\times \\sum_{i=1}^{n_1} S_{i,j}$$

    Alternatively, but with the same result, the sample variance of d, can be calculated with:
    $$s_d^2 = \\frac{n_2}{n_1} \\times \\frac{s_{\\hat{d}_1}^2}{n_2 - 1} + \\frac{n_1}{n_2} \\times \\frac{s_{\\hat{d}_2}^2}{n_1 - 1} - \\frac{s_{d_{xy}}^2}{n_1 \\times n_2}$$
    $$s_{\\hat{d}_1}^2 = \\frac{SS_{\\hat{d}_{1}}}{n_1 -1}, s_{\\hat{d}_2}^2 = \\frac{SS_{\\hat{d}_{2}}}{n_2 -1}, s_{d_{xy}}^2 = \\frac{SS_{d_{xy}}}{\\left(n_1 - 1\\right) \\times \\left(n_2 - 1\\right)}$$

    A different estimate (a 'consistent') is given by (Cliff, 1993, p. 499, eq. 7):
    $$\\hat{\\sigma}_d^2 = \\frac{\\left(n_2 - 1\\right) \\times s_{\\hat{d}_1}^2 + \\left(n_1 - 1\\right) \\times s_{\\hat{d}_2}^2 + s_{d_{xy}}^2}{n_1 \\times n_2}$$

    Vargha (2000, p. 280) and also Vargha and Delaney (2000, p. 7, eq. 9) use a t-distribution, instead of the standard normal distribution. They use the same test-statistic, and as degrees of freedom they use:
    $$p = 2 \\times \\left(1 - t\\left(\\left|C\\right|, df\\right)\\right)$$
    $$df = \\frac{\\left(a + b\\right)^2}{\\frac{a^2}{n_1 - 1} + \\frac{b^2}{n_2 - 1}}$$
    
    With:
    $$a_{BM} = \\frac{1}{n_1} \\times \\frac{s_{R_1^*}^2}{n_2^2}, b_{BM} = \\frac{1}{n_2} \\times \\frac{s_{R_2^*}^2}{n_1^2}$$
    $$s_{R_1^*}^2 = \\frac{SS_{R_1^*}}{n_1 -1}, s_{R_2^*}^2 = \\frac{SS_{R_2^*}}{n_2 -1}$$
    $$SS_{R_1^*} = \\sum_{i=1}^{n_1} \\left(r_{i1}^* - \\bar{R}_1^*\\right)^2, SS_{R_2^*} = \\sum_{j=1}^{n_2} \\left(r_{j2}^* - \\bar{R}_2^*\\right)^2$$
    $$\\bar{R}_1^* = \\frac{\\sum_{i=1}^{n_1} r_{i1}^*}{n_1}, \\bar{R}_2^* = \\frac{\\sum_{j=1}^{n_2} r_{j2}^*}{n_2}$$
    $$r_{i,1}^* = \\sum_{j=1}^{n_2} \\begin{cases} 1 & s_{i,j} = 1 \\\\  0.5 & s_{i,j} = 0 \\\\  0 & s_{i,j} = -1 \\end{cases}$$
    $$r_{i,2}^* = \\sum_{i=1}^{n_1} \\begin{cases} 1 & s_{i,j} = -1 \\\\  0.5 & s_{i,j} = 0 \\\\  0 & s_{i,j} = 1 \\end{cases}$$

    This is in line with the Fligner-Policello test, and used when *test_used='fligner-policello*. 

    Delaney and Vargha (2002) proposed also an alternative degrees of freedom, in line with the Brunner-Munzel test, as:
    $$a_{FP} = \\frac{s_{R_1^*}^2}{n_1}, b_{FP} = \\frac{s_{R_2^*}^2}{n_2}$$

    *Symbols used:*
    
    * \\(n_{k}\\), the number of scores in category k
    * \\(\\Phi\\left(\\dots\\right)\\), the cumulative distribution function of the standard normal distribution
    * \\(t\\left(\\dots\\right)\\), the cumulative distribution function of the t distribution

    A few additional notes.
    
    In Vargha and Delaney (2000, p. 7, eq. 9) they mention "df is rounded to the nearest integer", which is why this function allows for a rounding of the degrees of freedom (df).

    The \\(r^*\\) values, are so-called placement values. These can also be calculated by subtracting the within-mid-rank from the pooled mid-rank.

    The \\(d\\) value is known as Cliff Delta, and is also the same as the (Glass) rank biserial correlation coefficient. This in itself is sometimes used as an effect size measure, and various methods are available to calculate it.
    

    For the \\(\\hat{d}_{j2}\\) Cliff (1993) mentions: " \\( d_{.j} \\) represents the proportion of scores from the first population that lies **above** a given score from the second, minus the reverse " (p. 499), which is the one used here. However, Delaney and Vargha (2002) wrote: " \\(d_{.j}\\) denotes the proportion of the \\(X\\) scores that lie **below** \\(Y_j\\) minus the proportion that lie above" (p. 9), but this is most likely a mistake.

    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a cross table and/or some visualisation:
    
    * [tab_cross](../other/table_cross.html#tab_cross)
    * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)

    After the test you might want an effect size measure:
    
    * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
    * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
    * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)

    Alternatives for this test could be:
    
    * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
    * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
    * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
    * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
    * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\\\(C^2\\\\) test

    References
    ----------
    Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. *Psychological Bulletin, 114*(3), 494–509. doi:10.1037/0033-2909.114.3.494
    
    Delaney, H. D., & Vargha, A. (2002). Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples. *Psychological Methods, 7*(4), 485–503. doi:10.1037/1082-989X.7.4.485
    
    Vargha A. (2000). Két pszichológiai populáció sztochasztikus egyenlőségének ellenőrzésére alkalmas statisztikai próbák összehasonlító vizsgálata. *Magyar Pszichológiai Szemle, 55*(2–3), Article 2–3. doi:10/1/mpszle.55.2000.2-3.5.pdf
    
    Vargha, A., & Delaney, H. D. (2000). Comparing several robust tests of stochastic equality. https://eric.ed.gov/?id=ED441836

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    # DATA PREPARATION
    #convert to pandas series if needed
    if type(bin_field) is list:
        bin_field = pd.Series(bin_field)
    
    if type(ord_field) is list:
        ord_field = pd.Series(ord_field)
    
    #combine as one dataframe
    dfr = pd.concat([bin_field, ord_field], axis=1)
    dfr = dfr.dropna()
    dfr.columns=['cat', 'score']
    
    #replace the ordinal values if levels is provided
    if levels is not None:
        dfr['score'] = dfr['score'].map(levels).astype('Int8')
    else:
        dfr.iloc[:,1]  = pd.to_numeric(dfr.iloc[:,1] )
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = dfr.iloc[:,0].value_counts().index[0]
        cat2 = dfr.iloc[:,0].value_counts().index[1]
    cats = cat1 + ', ' + cat2
    
    # remove scores not in either of the two categories
    dfr = dfr[dfr['cat'].isin([cat1, cat2])]

    #seperate the scores for each category
    X = list(dfr.iloc[:,1][dfr.iloc[:,0] == cat1])
    Y = list(dfr.iloc[:,1][dfr.iloc[:,0] == cat2])
    n1 = len(X)
    n2 = len(Y)
    N = n1 + n2
    
    X = pd.Series(X)
    Y = pd.Series(Y)
    
    # Create the difference matrix: each row from X minus each column from Y
    diff = X.values[:, None] - Y.values[None, :]
    
    # Create DataFrame from the difference matrix
    df_diff = pd.DataFrame(diff, index=X.index, columns=Y.index)
    
    # Apply sign operation without defining a function
    S = (df_diff > 0).astype(int) - (df_diff < 0).astype(int)
    
    
    di1 = S.sum(axis=1)/n2
    dj2 = S.sum(axis=0)/n1
    d = di1.mean()
    
    SSd1 = di1.var(ddof=0) * n1
    SSd2 = dj2.var(ddof=0) * n2
    SSd = S.values.flatten().var(ddof=0)*(n1*n2)

    if var_ver=='unbiased':
        var_used = 'unbiased'
        var_d_v1 = (n2**2 * SSd1 + n1**2 * SSd2 - SSd) / (n1*n2*(n1 - 1)*(n2 - 1))
        var_d_min = (1 - d**2)/(n1*n2 - 1)
        var = max(var_d_v1, var_d_min)
    elif var_ver=='consistent':
        var_used = 'consistent'
        var_d1 = SSd1 / (n1 - 1)
        var_d2 = SSd2 / (n2 - 1)
        var_d = SSd / ((n1 - 1)*(n2 - 1))
        var = ((n2 - 1)*var_d1 + (n1 - 1)*var_d2 + var_d)/(n1*n2)
    
    C = d/(var**0.5)    

    if test_ver=='cliff':
        df = 'n.a.'
        test_used = 'Cliff Delta test with standard normal distribution'
        
        p = 2 * (1 - NormalDist().cdf(abs(C))) 
    
    else:                
        r_ast_1 = (S == 1).sum(axis=1) + 1/2*(S == 0).sum(axis=1)
        r_ast_2 = (S == -1).sum(axis=0) + 1/2*(S == 0).sum(axis=0)
    
        var_R_ast_1 = r_ast_1.var(ddof=1)
        var_R_ast_2 = r_ast_2.var(ddof=1)

        if test_ver=='fligner-policello' or test_ver=='fp':
            test_used = 'Cliff Delta test with t distribution, and Fligner-Policello df'
            a = var_R_ast_1/n1
            b = var_R_ast_2/n2
        elif test_ver=='brunner-munzel' or test_ver=='bm':
            test_used = 'Cliff Delta test with t distribution, and Brunner-Munzel df'
            a = 1/n1*var_R_ast_1/n2**2
            b = 1/n2*var_R_ast_2/n1**2

        # degrees of freedom
        df = (a + b)**2 /(a**2/(n1 - 1) + b**2/(n2 - 1))
        if round_df:
            df = round(df,0)
            
        p = 2*(1 - t.cdf(abs(C), df))

    results = {'var':var, 'Cliff delta':d, 'test statistic':C, 'df':df, 'p-value':p, 'categories':cats, 'var used':var_used, 'test used':test_used}
    results_pd = pd.DataFrame([results])
    return results_pd

Functions

def ts_cliff_delta_is(bin_field, ord_field, categories=None, levels=None, var_ver='unbiased', test_ver='cliff', round_df=False)

Cliff Delta Test (independent samples)

The Cliff test, is a test for stochastic-equivelance. This means that even if the medians are equal between two independent samples, this test could be significant.

Lets say we have one group A that scored 1, 2, 2, 5, 6, 6, 7, and another group B that scored 4, 4, 4, 5, 10, 10, 12. Each group has the same median (i.e. 5), and are symmetric around the median, but if a high score is positive, most people would rather be in group B than in group A. This is where ‘stochastic equality’ comes in. It looks at the chance if you pick a random person from group A and B each, the one from group A scores lower than the one from group B, and add half the chance that their equal. In this example that’s about 0.68.

Cliff (1993) used a standard normal distribution, while Vargha and Delaney later used a t-distribution (Vargha, 2000; Vargha & Delaney, 2000, Delaney & Vargha, 2002).

This function is shown in this YouTube video and the test is also described at PeterStatistics.com.

Parameters

bin_field : pandas series: data with categories for the rows
ord_field : pandas series: data with the scores (ordinal field)
categories : list or dictionary, optional: the two categories to use from bin_field. If not set the first two found will be used
levels : list or dictionary, optional: the scores in order
var_ver : {"unbiased", "consistent"}, optional: version of estimated variance to use.
test_ver : {'cliff', 'fligner-policello', 'brunner-munzel', 'fp', 'bm'}, optional: version of the test to use.
round_df : bool, optional: round degrees of freedom

Returns

A dataframe with:

var, the estimated overall variance
Cliff delta, the Cliff Delta value
test-statistic, the test statistic
df, the degrees of freedom
p-value, the significance (p-value), two-tailed
categories, the two categories that were used
var used, the variance that was used
test used, description of the test used.

Notes

The test-statistic for this test is (Cliff, 1993, p. 500): $C = \frac{d}{\sqrt{\hat{\sigma}_d^2}}$

Which follows a normal distribution (Cliff, 1993, p. 500), i.e. $p = 2 \times \left(1 - \Phi\left(\left|C\right|\right)\right)$

With: $d = \frac{\sum_{i=1}^{n_1}\sum_{j=1}^{n_2} d_{x_i, y_j}}{n_1 \times n_2}$ $d_{x_i, y_j} = S_{i,j}$ $S_{i,j} = \text{sign}\left(x_i - y_j\right) = \begin{cases} 1 & x_i \gt y_j \\ 0 & x_i = y_j \\ -1 & x_i \lt y_j \end{cases}$ $\hat{\sigma}_d^2 = \max{\left(s_d^2, s_{d, min}^2\right)}$ $s_{d, min}^2 = \frac{1 - d^2}{n_1\times n_2 - 1}$ $s_d^2 = \frac{n_2^2 \times SS_{\hat{d}_{1}} + n_1^2 \times SS_{\hat{d}_{2}} - SS_{d_{xy}}}{n_1\times n_2 \times \left(n_1 - 1\right) \times \left(n_2 - 1\right)}$ $SS_{\hat{d}_{1}} = \sum_{i=1}^{n_1} \left(\hat{d}_{i1} - d\right)^2, SS_{\hat{d}_{2}} = \sum_{j=1}^{n_2} \left(\hat{d}_{j2} - d\right)^2, SS_{d_{xy}} = \sum_{i=1}^{n_1}\sum_{j=1}^{n_2} \left(d_{x_i, y_j} - d\right)^2$ $\hat{d}_{i1} = \frac{1}{n_2} \times \sum_{j=1}^{n_2} \begin{cases} 1 & x_i \gt y_j \\ 0 & x_i = y_j \\ -1 & x_i \lt y_j \end{cases} = \frac{1}{n_2} \times \sum_{j=1}^{n_2} S_{i,j}$ $\hat{d}_{j2} = \frac{1}{n_1} \times \sum_{i=1}^{n_1} \begin{cases} 1 & x_i \gt y_j \\ 0 & x_i = y_j \\ -1 & x_i \lt y_j \end{cases} = \frac{1}{n_1} \times \sum_{i=1}^{n_1} S_{i,j}$

Alternatively, but with the same result, the sample variance of d, can be calculated with: $s_d^2 = \frac{n_2}{n_1} \times \frac{s_{\hat{d}_1}^2}{n_2 - 1} + \frac{n_1}{n_2} \times \frac{s_{\hat{d}_2}^2}{n_1 - 1} - \frac{s_{d_{xy}}^2}{n_1 \times n_2}$ $s_{\hat{d}_1}^2 = \frac{SS_{\hat{d}_{1}}}{n_1 -1}, s_{\hat{d}_2}^2 = \frac{SS_{\hat{d}_{2}}}{n_2 -1}, s_{d_{xy}}^2 = \frac{SS_{d_{xy}}}{\left(n_1 - 1\right) \times \left(n_2 - 1\right)}$

A different estimate (a 'consistent') is given by (Cliff, 1993, p. 499, eq. 7): $\hat{\sigma}_d^2 = \frac{\left(n_2 - 1\right) \times s_{\hat{d}_1}^2 + \left(n_1 - 1\right) \times s_{\hat{d}_2}^2 + s_{d_{xy}}^2}{n_1 \times n_2}$

Vargha (2000, p. 280) and also Vargha and Delaney (2000, p. 7, eq. 9) use a t-distribution, instead of the standard normal distribution. They use the same test-statistic, and as degrees of freedom they use: $p = 2 \times \left(1 - t\left(\left|C\right|, df\right)\right)$ $df = \frac{\left(a + b\right)^2}{\frac{a^2}{n_1 - 1} + \frac{b^2}{n_2 - 1}}$

With: $a_{BM} = \frac{1}{n_1} \times \frac{s_{R_1^*}^2}{n_2^2}, b_{BM} = \frac{1}{n_2} \times \frac{s_{R_2^*}^2}{n_1^2}$ $s_{R_1^*}^2 = \frac{SS_{R_1^*}}{n_1 -1}, s_{R_2^*}^2 = \frac{SS_{R_2^*}}{n_2 -1}$ $SS_{R_1^*} = \sum_{i=1}^{n_1} \left(r_{i1}^* - \bar{R}_1^*\right)^2, SS_{R_2^*} = \sum_{j=1}^{n_2} \left(r_{j2}^* - \bar{R}_2^*\right)^2$ $\bar{R}_1^* = \frac{\sum_{i=1}^{n_1} r_{i1}^*}{n_1}, \bar{R}_2^* = \frac{\sum_{j=1}^{n_2} r_{j2}^*}{n_2}$ $r_{i,1}^* = \sum_{j=1}^{n_2} \begin{cases} 1 & s_{i,j} = 1 \\ 0.5 & s_{i,j} = 0 \\ 0 & s_{i,j} = -1 \end{cases}$ $r_{i,2}^* = \sum_{i=1}^{n_1} \begin{cases} 1 & s_{i,j} = -1 \\ 0.5 & s_{i,j} = 0 \\ 0 & s_{i,j} = 1 \end{cases}$

This is in line with the Fligner-Policello test, and used when test_used='fligner-policello.

Delaney and Vargha (2002) proposed also an alternative degrees of freedom, in line with the Brunner-Munzel test, as: $a_{FP} = \frac{s_{R_1^*}^2}{n_1}, b_{FP} = \frac{s_{R_2^*}^2}{n_2}$

Symbols used:

$n_{k}$ , the number of scores in category k
$\Phi\left(\dots\right)$ , the cumulative distribution function of the standard normal distribution
$t\left(\dots\right)$ , the cumulative distribution function of the t distribution

A few additional notes.

In Vargha and Delaney (2000, p. 7, eq. 9) they mention "df is rounded to the nearest integer", which is why this function allows for a rounding of the degrees of freedom (df).

The $r^*$ values, are so-called placement values. These can also be calculated by subtracting the within-mid-rank from the pooled mid-rank.

The $d$ value is known as Cliff Delta, and is also the same as the (Glass) rank biserial correlation coefficient. This in itself is sometimes used as an effect size measure, and various methods are available to calculate it.

For the $\hat{d}_{j2}$ Cliff (1993) mentions: " $d_{.j}$ represents the proportion of scores from the first population that lies above a given score from the second, minus the reverse " (p. 499), which is the one used here. However, Delaney and Vargha (2002) wrote: " $d_{.j}$ denotes the proportion of the $X$ scores that lie below $Y_j$ minus the proportion that lie above" (p. 9), but this is most likely a mistake.

Before, After and Alternatives

Before running the test you might first want to get an impression using a cross table and/or some visualisation:

After the test you might want an effect size measure:

es_common_language_is for Common Language Effect Size
me_hodges_lehmann_is for Hodges-Lehmann estimate
r_rank_biserial_is for (Glass) Rank Biserial (Cliff delta)

Alternatives for this test could be:

ts_mann_whitney for the Mann-Whitney U test
ts_fligner_policello for the Fligner-Policello test
ts_brunner_munzel for the Brunner-Munzel test
ts_brunner_munzel_perm for the Brunner-Munzel Permutation test
ts_c_square for the $C^2$ test

References

Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114(3), 494–509. doi:10.1037/0033-2909.114.3.494

Delaney, H. D., & Vargha, A. (2002). Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples. Psychological Methods, 7(4), 485–503. doi:10.1037/1082-989X.7.4.485

Vargha A. (2000). Két pszichológiai populáció sztochasztikus egyenlőségének ellenőrzésére alkalmas statisztikai próbák összehasonlító vizsgálata. Magyar Pszichológiai Szemle, 55(2–3), Article 2–3. doi:10/1/mpszle.55.2000.2-3.5.pdf

Vargha, A., & Delaney, H. D. (2000). Comparing several robust tests of stochastic equality. https://eric.ed.gov/?id=ED441836

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code

def ts_cliff_delta_is(bin_field, ord_field, categories=None, levels=None, var_ver='unbiased', test_ver='cliff', round_df=False):
    '''
    Cliff Delta Test (independent samples)
    --------------------------------------
    The Cliff test, is a test for stochastic-equivelance. This means that even if the medians are equal between two independent samples, this test could be significant.
    
    Lets say we have one group A that scored 1, 2, 2, 5, 6, 6, 7, and another group B that scored 4, 4, 4, 5, 10, 10, 12. Each group has the same median (i.e. 5), and are symmetric around the median, but if a high score is positive, most people would rather be in group B than in group A. This is where ‘stochastic equality’ comes in. It looks at the chance if you pick a random person from group A and B each, the one from group A scores lower than the one from group B, and add half the chance that their equal. In this example that’s about 0.68.
    
    Cliff (1993) used a standard normal distribution, while Vargha and Delaney later used a t-distribution (Vargha, 2000; Vargha & Delaney, 2000, Delaney & Vargha, 2002).

    This function is shown in this [YouTube video](https://youtu.be/q4tlBLStpeQ) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/CliffDeltaTest.html).

    Parameters
    ----------
    bin_field : pandas series
        data with categories for the rows
    ord_field : pandas series
        data with the scores (ordinal field)
    categories : list or dictionary, optional
        the two categories to use from bin_field. If not set the first two found will be used
    levels : list or dictionary, optional
        the scores in order
    var_ver : {"unbiased", "consistent"}, optional
        version of estimated variance to use.
    test_ver : {'cliff', 'fligner-policello', 'brunner-munzel', 'fp', 'bm'}, optional
        version of the test to use.
    round_df : bool, optional
        round degrees of freedom

    Returns
    -------
    A dataframe with:
    
    * *var*, the estimated overall variance
    * *Cliff delta*, the Cliff Delta value
    * *test-statistic*, the test statistic
    * *df*, the degrees of freedom
    * *p-value*, the significance (p-value), two-tailed
    * *categories*, the two categories that were used
    * *var used*, the variance that was used
    * *test used*, description of the test used.
    
    Notes
    -----
    The test-statistic for this test is (Cliff, 1993, p. 500):
    $$C = \\frac{d}{\\sqrt{\\hat{\\sigma}_d^2}}$$

    Which follows a normal distribution (Cliff, 1993, p. 500), i.e. 
    $$p = 2 \\times \\left(1 - \\Phi\\left(\\left|C\\right|\\right)\\right)$$

    With:
    $$d = \\frac{\\sum_{i=1}^{n_1}\\sum_{j=1}^{n_2} d_{x_i, y_j}}{n_1 \\times n_2}$$
    $$d_{x_i, y_j} = S_{i,j}$$
    $$S_{i,j} = \\text{sign}\\left(x_i - y_j\\right) = \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases}$$
    $$\\hat{\\sigma}_d^2 = \\max{\\left(s_d^2, s_{d, min}^2\\right)}$$
    $$s_{d, min}^2 = \\frac{1 - d^2}{n_1\\times n_2 - 1}$$
    $$s_d^2 = \\frac{n_2^2 \\times SS_{\\hat{d}_{1}} + n_1^2 \\times SS_{\\hat{d}_{2}} - SS_{d_{xy}}}{n_1\\times n_2 \\times \\left(n_1 - 1\\right) \\times \\left(n_2 - 1\\right)}$$
    $$SS_{\\hat{d}_{1}} = \\sum_{i=1}^{n_1} \\left(\\hat{d}_{i1} - d\\right)^2, SS_{\\hat{d}_{2}} = \\sum_{j=1}^{n_2} \\left(\\hat{d}_{j2} - d\\right)^2, SS_{d_{xy}} = \\sum_{i=1}^{n_1}\\sum_{j=1}^{n_2} \\left(d_{x_i, y_j} - d\\right)^2$$
    $$\\hat{d}_{i1} = \\frac{1}{n_2} \\times \\sum_{j=1}^{n_2} \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases} = \\frac{1}{n_2} \\times \\sum_{j=1}^{n_2} S_{i,j}$$
    $$\\hat{d}_{j2} = \\frac{1}{n_1} \\times \\sum_{i=1}^{n_1} \\begin{cases} 1 & x_i \\gt y_j \\\\ 0 & x_i = y_j \\\\ -1 & x_i \\lt y_j \\end{cases} = \\frac{1}{n_1} \\times \\sum_{i=1}^{n_1} S_{i,j}$$

    Alternatively, but with the same result, the sample variance of d, can be calculated with:
    $$s_d^2 = \\frac{n_2}{n_1} \\times \\frac{s_{\\hat{d}_1}^2}{n_2 - 1} + \\frac{n_1}{n_2} \\times \\frac{s_{\\hat{d}_2}^2}{n_1 - 1} - \\frac{s_{d_{xy}}^2}{n_1 \\times n_2}$$
    $$s_{\\hat{d}_1}^2 = \\frac{SS_{\\hat{d}_{1}}}{n_1 -1}, s_{\\hat{d}_2}^2 = \\frac{SS_{\\hat{d}_{2}}}{n_2 -1}, s_{d_{xy}}^2 = \\frac{SS_{d_{xy}}}{\\left(n_1 - 1\\right) \\times \\left(n_2 - 1\\right)}$$

    A different estimate (a 'consistent') is given by (Cliff, 1993, p. 499, eq. 7):
    $$\\hat{\\sigma}_d^2 = \\frac{\\left(n_2 - 1\\right) \\times s_{\\hat{d}_1}^2 + \\left(n_1 - 1\\right) \\times s_{\\hat{d}_2}^2 + s_{d_{xy}}^2}{n_1 \\times n_2}$$

    Vargha (2000, p. 280) and also Vargha and Delaney (2000, p. 7, eq. 9) use a t-distribution, instead of the standard normal distribution. They use the same test-statistic, and as degrees of freedom they use:
    $$p = 2 \\times \\left(1 - t\\left(\\left|C\\right|, df\\right)\\right)$$
    $$df = \\frac{\\left(a + b\\right)^2}{\\frac{a^2}{n_1 - 1} + \\frac{b^2}{n_2 - 1}}$$
    
    With:
    $$a_{BM} = \\frac{1}{n_1} \\times \\frac{s_{R_1^*}^2}{n_2^2}, b_{BM} = \\frac{1}{n_2} \\times \\frac{s_{R_2^*}^2}{n_1^2}$$
    $$s_{R_1^*}^2 = \\frac{SS_{R_1^*}}{n_1 -1}, s_{R_2^*}^2 = \\frac{SS_{R_2^*}}{n_2 -1}$$
    $$SS_{R_1^*} = \\sum_{i=1}^{n_1} \\left(r_{i1}^* - \\bar{R}_1^*\\right)^2, SS_{R_2^*} = \\sum_{j=1}^{n_2} \\left(r_{j2}^* - \\bar{R}_2^*\\right)^2$$
    $$\\bar{R}_1^* = \\frac{\\sum_{i=1}^{n_1} r_{i1}^*}{n_1}, \\bar{R}_2^* = \\frac{\\sum_{j=1}^{n_2} r_{j2}^*}{n_2}$$
    $$r_{i,1}^* = \\sum_{j=1}^{n_2} \\begin{cases} 1 & s_{i,j} = 1 \\\\  0.5 & s_{i,j} = 0 \\\\  0 & s_{i,j} = -1 \\end{cases}$$
    $$r_{i,2}^* = \\sum_{i=1}^{n_1} \\begin{cases} 1 & s_{i,j} = -1 \\\\  0.5 & s_{i,j} = 0 \\\\  0 & s_{i,j} = 1 \\end{cases}$$

    This is in line with the Fligner-Policello test, and used when *test_used='fligner-policello*. 

    Delaney and Vargha (2002) proposed also an alternative degrees of freedom, in line with the Brunner-Munzel test, as:
    $$a_{FP} = \\frac{s_{R_1^*}^2}{n_1}, b_{FP} = \\frac{s_{R_2^*}^2}{n_2}$$

    *Symbols used:*
    
    * \\(n_{k}\\), the number of scores in category k
    * \\(\\Phi\\left(\\dots\\right)\\), the cumulative distribution function of the standard normal distribution
    * \\(t\\left(\\dots\\right)\\), the cumulative distribution function of the t distribution

    A few additional notes.
    
    In Vargha and Delaney (2000, p. 7, eq. 9) they mention "df is rounded to the nearest integer", which is why this function allows for a rounding of the degrees of freedom (df).

    The \\(r^*\\) values, are so-called placement values. These can also be calculated by subtracting the within-mid-rank from the pooled mid-rank.

    The \\(d\\) value is known as Cliff Delta, and is also the same as the (Glass) rank biserial correlation coefficient. This in itself is sometimes used as an effect size measure, and various methods are available to calculate it.
    

    For the \\(\\hat{d}_{j2}\\) Cliff (1993) mentions: " \\( d_{.j} \\) represents the proportion of scores from the first population that lies **above** a given score from the second, minus the reverse " (p. 499), which is the one used here. However, Delaney and Vargha (2002) wrote: " \\(d_{.j}\\) denotes the proportion of the \\(X\\) scores that lie **below** \\(Y_j\\) minus the proportion that lie above" (p. 9), but this is most likely a mistake.

    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a cross table and/or some visualisation:
    
    * [tab_cross](../other/table_cross.html#tab_cross)
    * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)

    After the test you might want an effect size measure:
    
    * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
    * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
    * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)

    Alternatives for this test could be:
    
    * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
    * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
    * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
    * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
    * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\\\(C^2\\\\) test

    References
    ----------
    Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. *Psychological Bulletin, 114*(3), 494–509. doi:10.1037/0033-2909.114.3.494
    
    Delaney, H. D., & Vargha, A. (2002). Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples. *Psychological Methods, 7*(4), 485–503. doi:10.1037/1082-989X.7.4.485
    
    Vargha A. (2000). Két pszichológiai populáció sztochasztikus egyenlőségének ellenőrzésére alkalmas statisztikai próbák összehasonlító vizsgálata. *Magyar Pszichológiai Szemle, 55*(2–3), Article 2–3. doi:10/1/mpszle.55.2000.2-3.5.pdf
    
    Vargha, A., & Delaney, H. D. (2000). Comparing several robust tests of stochastic equality. https://eric.ed.gov/?id=ED441836

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    # DATA PREPARATION
    #convert to pandas series if needed
    if type(bin_field) is list:
        bin_field = pd.Series(bin_field)
    
    if type(ord_field) is list:
        ord_field = pd.Series(ord_field)
    
    #combine as one dataframe
    dfr = pd.concat([bin_field, ord_field], axis=1)
    dfr = dfr.dropna()
    dfr.columns=['cat', 'score']
    
    #replace the ordinal values if levels is provided
    if levels is not None:
        dfr['score'] = dfr['score'].map(levels).astype('Int8')
    else:
        dfr.iloc[:,1]  = pd.to_numeric(dfr.iloc[:,1] )
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = dfr.iloc[:,0].value_counts().index[0]
        cat2 = dfr.iloc[:,0].value_counts().index[1]
    cats = cat1 + ', ' + cat2
    
    # remove scores not in either of the two categories
    dfr = dfr[dfr['cat'].isin([cat1, cat2])]

    #seperate the scores for each category
    X = list(dfr.iloc[:,1][dfr.iloc[:,0] == cat1])
    Y = list(dfr.iloc[:,1][dfr.iloc[:,0] == cat2])
    n1 = len(X)
    n2 = len(Y)
    N = n1 + n2
    
    X = pd.Series(X)
    Y = pd.Series(Y)
    
    # Create the difference matrix: each row from X minus each column from Y
    diff = X.values[:, None] - Y.values[None, :]
    
    # Create DataFrame from the difference matrix
    df_diff = pd.DataFrame(diff, index=X.index, columns=Y.index)
    
    # Apply sign operation without defining a function
    S = (df_diff > 0).astype(int) - (df_diff < 0).astype(int)
    
    
    di1 = S.sum(axis=1)/n2
    dj2 = S.sum(axis=0)/n1
    d = di1.mean()
    
    SSd1 = di1.var(ddof=0) * n1
    SSd2 = dj2.var(ddof=0) * n2
    SSd = S.values.flatten().var(ddof=0)*(n1*n2)

    if var_ver=='unbiased':
        var_used = 'unbiased'
        var_d_v1 = (n2**2 * SSd1 + n1**2 * SSd2 - SSd) / (n1*n2*(n1 - 1)*(n2 - 1))
        var_d_min = (1 - d**2)/(n1*n2 - 1)
        var = max(var_d_v1, var_d_min)
    elif var_ver=='consistent':
        var_used = 'consistent'
        var_d1 = SSd1 / (n1 - 1)
        var_d2 = SSd2 / (n2 - 1)
        var_d = SSd / ((n1 - 1)*(n2 - 1))
        var = ((n2 - 1)*var_d1 + (n1 - 1)*var_d2 + var_d)/(n1*n2)
    
    C = d/(var**0.5)    

    if test_ver=='cliff':
        df = 'n.a.'
        test_used = 'Cliff Delta test with standard normal distribution'
        
        p = 2 * (1 - NormalDist().cdf(abs(C))) 
    
    else:                
        r_ast_1 = (S == 1).sum(axis=1) + 1/2*(S == 0).sum(axis=1)
        r_ast_2 = (S == -1).sum(axis=0) + 1/2*(S == 0).sum(axis=0)
    
        var_R_ast_1 = r_ast_1.var(ddof=1)
        var_R_ast_2 = r_ast_2.var(ddof=1)

        if test_ver=='fligner-policello' or test_ver=='fp':
            test_used = 'Cliff Delta test with t distribution, and Fligner-Policello df'
            a = var_R_ast_1/n1
            b = var_R_ast_2/n2
        elif test_ver=='brunner-munzel' or test_ver=='bm':
            test_used = 'Cliff Delta test with t distribution, and Brunner-Munzel df'
            a = 1/n1*var_R_ast_1/n2**2
            b = 1/n2*var_R_ast_2/n1**2

        # degrees of freedom
        df = (a + b)**2 /(a**2/(n1 - 1) + b**2/(n2 - 1))
        if round_df:
            df = round(df,0)
            
        p = 2*(1 - t.cdf(abs(C), df))

    results = {'var':var, 'Cliff delta':d, 'test statistic':C, 'df':df, 'p-value':p, 'categories':cats, 'var used':var_used, 'test used':test_used}
    results_pd = pd.DataFrame([results])
    return results_pd