Module `stikpetP.tests.test_c_square`

Expand source code

import pandas as pd
from scipy.stats import chi2

def ts_c_square(bin_field, ord_field, categories=None, levels=None):
    '''
    C-square Test
    -----------------------------------------
    An improved version of the Brunner-Munzel test. It tests so-called stochastic dominance. It tests if you pick a random score from the first category, will there be a higher (or lower) chance than 50% it will be higher than a random score from the other category. Note this is not the same as a median test. A very short example.
    
    If we have the scores 40, 50 and 60 in group A, and the scores 30, 50, and 51 in group B, the median of each group is 50. However if we pick 40 from group A, there is a 1/3 chance it is higher than a value from group B. If we pick 50 there is also a 1/3 chance, and if we pick 60 there is a 3/3 chance. Together, there is a (1/3+1/3+3/3)/3 = 5/9 chance that if you pick a random value from group A and B, that the one in group A is higher. This is quite off from the 50%.

    The test could therefor be used if we have a binary and an ordinal variable. The Brunner-Munzel test is known not to be suitable in small sample sizes. This C-square test has no such limitation.

    This function is shown in this [YouTube video](https://youtu.be/ndGxBskP9NI) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/C-squareTest.html).

    Parameters
    ----------
    bin_field : pandas series
        data with categories for the rows
    ord_field : pandas series
        data with the scores (ordinal field)
    categories : list or dictionary, optional
        the two categories to use from bin_field. If not set the first two found will be used
    levels : list or dictionary, optional
        the scores in order

    Returns
    -------
    A dataframe with:
    
    * *var. est.*, the estimated overall variance
    * *test-statistic*, the Brunner-Munzel statistic
    * *df*, the degrees of freedom
    * *p-value*, the significance (p-value), two-tailed
    * *categories*, the two categories that were used

    Notes
    -----
    The test statistic (Schüürhuis et al., 2025, p. 9, eq. 17):
    $$C^2 = \\frac{4}{\\hat{\\sigma}_N^2} \\times \\hat{\\theta}_N\\times\\left(1 - \\hat{\\theta}_N\\right) \\times \\left(\\hat{\\theta}_N - \\frac{1}{2}\\right)^2$$

    the estimate of the overall variance of (Schüürhuis et al., 2025, p. 5, eq. 5):
    $$\\hat{\\sigma}_N^2 = \\frac{1}{d_n}\\times\\left(\\left(\\sum_{k=1}^2 SS_k^*\\right) - n_1 \\times n_2 \\times \\left(\\hat{\\theta}_N\\times\\left(1 - \\hat{\\theta}_N\\right) - \\frac{\\hat{\\tau}_N}{4}\\right)\\right)$$
    
    the sum of squares for each of the two categories of the placement values:
    $$SS_k^* = \\sum_{i=1}^{n_k} \\left(R_{ik}^* - \\bar{R}_{k}^*\\right)^2$$

    the mean of placement values (Schüürhuis et al., 2025, p. 5):
    $$\\bar{R}_{k}^* = \\frac{\\sum_{i=1}^{n_k} R_{ik}^*}{n_k}$$

    the placement values:
    $$R_{ik}^* = R_{ik} - R_{ik}^{(k)}$$

    Denominator (Schüürhuis et al., 2025, p. 5):
    $$d_n = n_1\\times\\left(n_1 - 1\\right) \\times n_2\\times\\left(n_2 - 1\\right)$$

    the probability of ties in the overlap (Schüürhuis et al., 2025, p. 5, eq. 4):
    $$\\hat{\\tau}_N = \\frac{1}{n_1}\\times\\left(\\bar{R}_2^+ - \\bar{R}_2^- - \\left(\\bar{R}_2^{(2)+} - \\bar{R}_2^{(2)-}\\right)\\right)$$

    estimated Mann-Whitney effect (Schüürhuis et al., 2025, p. 4, eq. 1):
    $$\\hat{\\theta} = \\frac{1}{n_1} \\times \\left(\\bar{R}_2 - \\frac{n_2 + 1}{2}\\right)$$
    note that this is the same as $\\hat{p}$ in the Brunner-Munzel test.

    some means from the second category:
    $$\\bar{R}_2 = \\frac{\\sum_{i=1}^{n_2} R_{i2}}{n_2}, \\bar{R}_2^+ = \\frac{\\sum_{i=1}^{n_2} R_{i2}^+}{n_2}, \\bar{R}_2^{(2)+} = \\frac{\\sum_{i=1}^{n_2} R_{i2}^{(2)+}}{n_2}, \\bar{R}_2^{(2)-} = \\frac{\\sum_{i=1}^{n_2} R_{i2}^{(2)-}}{n_2}$$

    *Symbols used:*
    * \\(R_{ik}\\), the mid rank of the i-th score in category k, when using all combined scores
    * \\(R_{ik}^{(k)}\\), the mid-rank of the i-th score in category k, when using only scores from category k
    * \\(R_{ik}^-\\), the minimum rank of the i-th score in category k, when using all combined scores
    * \\(R_{ik}^{(k)-}\\), the minimum-rank of the i-th score in category k, when using only scores from category k
    * \\(R_{ik}^+\\), the maximum rank of the i-th score in category k, when using all combined scores
    * \\(R_{ik}^{(k)+}\\), the maximum-rank of the i-th score in category k, when using only scores from category k
    * \\(N\\), the total sample size
    * \\(n_{k}\\), the number of scores in category k

    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a cross table and/or some visualisation:
    
    * [tab_cross](../other/table_cross.html#tab_cross)
    * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)

    After the test you might want an effect size measure:
    
    * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
    * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
    * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)

    Alternatives for this test could be:
    
    * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
    * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
    * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
    * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test

    References
    ----------
    Schüürhuis, S., Konietschke, F., & Brunner, E. (2025). A new approach to the Nonparametric Behrens-Fisher problem with compatible confidence intervals (arXiv:2504.01796). arXiv. https://doi.org/10.48550/arXiv.2504.01796

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv"
    >>> df1 = pd.read_csv(file1, sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> coding = {'Far too little': 1, 'too little': 2, 'Enough': 3, 'Too much': 4, 'Far too much': 5}
    >>> ts_c_square(df1['Gen_Gender'], df1['Mix_NrAct'], levels=coding)
       var. est.  test statistic  df   p-value    categories
    0   0.003434       14.595231   1  0.000133  Male, Female
    
    '''
    # DATA PREPARATION
    #convert to pandas series if needed
    if type(bin_field) is list:
        bin_field = pd.Series(bin_field)
    
    if type(ord_field) is list:
        ord_field = pd.Series(ord_field)
    
    #combine as one dataframe
    df = pd.concat([bin_field, ord_field], axis=1)
    df = df.dropna()
    df.columns=['cat', 'score']
    
    #replace the ordinal values if levels is provided
    if levels is not None:
        df['score'] = df['score'].map(levels).astype('Int8')
    else:
        df.iloc[:,1]  = pd.to_numeric(df.iloc[:,1] )
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    # remove scores not in either of the two categories
    df = df[df['cat'].isin([cat1, cat2])]

    df['mid-rank'] = df['score'].rank()
    df['min-rank'] = df['score'].rank(method='min')
    df['max-rank'] = df['score'].rank(method='max')
    df['mid-rank_within'] = df.groupby('cat')['score'].rank()
    nk = df['cat'].value_counts()
    
    df_2 = df[df['cat'].isin([cat2])].copy()
    df_2['min-rank_within'] = df_2['score'].rank(method='min')
    df_2['max-rank_within'] = df_2['score'].rank(method='max')

    barR2 = df_2['mid-rank'].mean()
    barR2plus = df_2['max-rank'].mean()
    barR2min = df_2['min-rank'].mean()
    barR2plus2 = df_2['max-rank_within'].mean()
    barR2min2 = df_2['min-rank_within'].mean()

    thetaN = 1/nk[cat1]*(barR2 - (nk[cat2] + 1)/2)
    tauN = 1/nk[cat1]*(barR2plus - barR2min - (barR2plus2 - barR2min2))
    dn = nk[cat1]*(nk[cat1] - 1)*nk[cat2]*(nk[cat2] - 1)
    
    df['placement'] = df['mid-rank'] - df['mid-rank_within']
    SSrAst = df.groupby('cat')['placement'].var(ddof=0)*nk
    varN = 1/dn*(sum(SSrAst) - nk[cat1]*nk[cat2]*(thetaN*(1 - thetaN) - tauN/4))
    Csq = 1/varN*4*thetaN*(1 - thetaN)*(thetaN - 1/2)**2
    p_val = chi2.sf(Csq, 1)

    results = {'var. est.':varN, 'test statistic':Csq, 'df':1, 'p-value':p_val, 'categories':cat1 + ', ' + cat2}
    results_pd = pd.DataFrame([results])
    
    return results_pd

Functions

def ts_c_square(bin_field, ord_field, categories=None, levels=None)

C-square Test

An improved version of the Brunner-Munzel test. It tests so-called stochastic dominance. It tests if you pick a random score from the first category, will there be a higher (or lower) chance than 50% it will be higher than a random score from the other category. Note this is not the same as a median test. A very short example.

If we have the scores 40, 50 and 60 in group A, and the scores 30, 50, and 51 in group B, the median of each group is 50. However if we pick 40 from group A, there is a 1/3 chance it is higher than a value from group B. If we pick 50 there is also a 1/3 chance, and if we pick 60 there is a 3/3 chance. Together, there is a (1/3+1/3+3/3)/3 = 5/9 chance that if you pick a random value from group A and B, that the one in group A is higher. This is quite off from the 50%.

The test could therefor be used if we have a binary and an ordinal variable. The Brunner-Munzel test is known not to be suitable in small sample sizes. This C-square test has no such limitation.

This function is shown in this YouTube video and the test is also described at PeterStatistics.com.

Parameters

bin_field : pandas series: data with categories for the rows
ord_field : pandas series: data with the scores (ordinal field)
categories : list or dictionary, optional: the two categories to use from bin_field. If not set the first two found will be used
levels : list or dictionary, optional: the scores in order

Returns

A dataframe with:

var. est., the estimated overall variance
test-statistic, the Brunner-Munzel statistic
df, the degrees of freedom
p-value, the significance (p-value), two-tailed
categories, the two categories that were used

Notes

The test statistic (Schüürhuis et al., 2025, p. 9, eq. 17): $C^2 = \frac{4}{\hat{\sigma}_N^2} \times \hat{\theta}_N\times\left(1 - \hat{\theta}_N\right) \times \left(\hat{\theta}_N - \frac{1}{2}\right)^2$

the estimate of the overall variance of (Schüürhuis et al., 2025, p. 5, eq. 5): $\hat{\sigma}_N^2 = \frac{1}{d_n}\times\left(\left(\sum_{k=1}^2 SS_k^*\right) - n_1 \times n_2 \times \left(\hat{\theta}_N\times\left(1 - \hat{\theta}_N\right) - \frac{\hat{\tau}_N}{4}\right)\right)$

the sum of squares for each of the two categories of the placement values: $SS_k^* = \sum_{i=1}^{n_k} \left(R_{ik}^* - \bar{R}_{k}^*\right)^2$

the mean of placement values (Schüürhuis et al., 2025, p. 5): $\bar{R}_{k}^* = \frac{\sum_{i=1}^{n_k} R_{ik}^*}{n_k}$

the placement values: $R_{ik}^* = R_{ik} - R_{ik}^{(k)}$

Denominator (Schüürhuis et al., 2025, p. 5): $d_n = n_1\times\left(n_1 - 1\right) \times n_2\times\left(n_2 - 1\right)$

the probability of ties in the overlap (Schüürhuis et al., 2025, p. 5, eq. 4): $\hat{\tau}_N = \frac{1}{n_1}\times\left(\bar{R}_2^+ - \bar{R}_2^- - \left(\bar{R}_2^{(2)+} - \bar{R}_2^{(2)-}\right)\right)$

estimated Mann-Whitney effect (Schüürhuis et al., 2025, p. 4, eq. 1): $\hat{\theta} = \frac{1}{n_1} \times \left(\bar{R}_2 - \frac{n_2 + 1}{2}\right)$ note that this is the same as $\hat{p}$ in the Brunner-Munzel test.

some means from the second category: $\bar{R}_2 = \frac{\sum_{i=1}^{n_2} R_{i2}}{n_2}, \bar{R}_2^+ = \frac{\sum_{i=1}^{n_2} R_{i2}^+}{n_2}, \bar{R}_2^{(2)+} = \frac{\sum_{i=1}^{n_2} R_{i2}^{(2)+}}{n_2}, \bar{R}_2^{(2)-} = \frac{\sum_{i=1}^{n_2} R_{i2}^{(2)-}}{n_2}$

Symbols used: * $R_{ik}$ , the mid rank of the i-th score in category k, when using all combined scores * $R_{ik}^{(k)}$ , the mid-rank of the i-th score in category k, when using only scores from category k * $R_{ik}^-$ , the minimum rank of the i-th score in category k, when using all combined scores * $R_{ik}^{(k)-}$ , the minimum-rank of the i-th score in category k, when using only scores from category k * $R_{ik}^+$ , the maximum rank of the i-th score in category k, when using all combined scores * $R_{ik}^{(k)+}$ , the maximum-rank of the i-th score in category k, when using only scores from category k * $N$ , the total sample size * $n_{k}$ , the number of scores in category k

Before, After and Alternatives

Before running the test you might first want to get an impression using a cross table and/or some visualisation:

After the test you might want an effect size measure:

es_common_language_is for Common Language Effect Size
me_hodges_lehmann_is for Hodges-Lehmann estimate
r_rank_biserial_is for (Glass) Rank Biserial (Cliff delta)

Alternatives for this test could be:

ts_mann_whitney for the Mann-Whitney U test
ts_fligner_policello for the Fligner-Policello test
ts_brunner_munzel for the Brunner-Munzel test
ts_brunner_munzel_perm for the Brunner-Munzel Permutation test

References

Schüürhuis, S., Konietschke, F., & Brunner, E. (2025). A new approach to the Nonparametric Behrens-Fisher problem with compatible confidence intervals (arXiv:2504.01796). arXiv. https://doi.org/10.48550/arXiv.2504.01796

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

>>> file1 = "https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv"
>>> df1 = pd.read_csv(file1, sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> coding = {'Far too little': 1, 'too little': 2, 'Enough': 3, 'Too much': 4, 'Far too much': 5}
>>> ts_c_square(df1['Gen_Gender'], df1['Mix_NrAct'], levels=coding)
   var. est.  test statistic  df   p-value    categories
0   0.003434       14.595231   1  0.000133  Male, Female

Expand source code

def ts_c_square(bin_field, ord_field, categories=None, levels=None):
    '''
    C-square Test
    -----------------------------------------
    An improved version of the Brunner-Munzel test. It tests so-called stochastic dominance. It tests if you pick a random score from the first category, will there be a higher (or lower) chance than 50% it will be higher than a random score from the other category. Note this is not the same as a median test. A very short example.
    
    If we have the scores 40, 50 and 60 in group A, and the scores 30, 50, and 51 in group B, the median of each group is 50. However if we pick 40 from group A, there is a 1/3 chance it is higher than a value from group B. If we pick 50 there is also a 1/3 chance, and if we pick 60 there is a 3/3 chance. Together, there is a (1/3+1/3+3/3)/3 = 5/9 chance that if you pick a random value from group A and B, that the one in group A is higher. This is quite off from the 50%.

    The test could therefor be used if we have a binary and an ordinal variable. The Brunner-Munzel test is known not to be suitable in small sample sizes. This C-square test has no such limitation.

    This function is shown in this [YouTube video](https://youtu.be/ndGxBskP9NI) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/C-squareTest.html).

    Parameters
    ----------
    bin_field : pandas series
        data with categories for the rows
    ord_field : pandas series
        data with the scores (ordinal field)
    categories : list or dictionary, optional
        the two categories to use from bin_field. If not set the first two found will be used
    levels : list or dictionary, optional
        the scores in order

    Returns
    -------
    A dataframe with:
    
    * *var. est.*, the estimated overall variance
    * *test-statistic*, the Brunner-Munzel statistic
    * *df*, the degrees of freedom
    * *p-value*, the significance (p-value), two-tailed
    * *categories*, the two categories that were used

    Notes
    -----
    The test statistic (Schüürhuis et al., 2025, p. 9, eq. 17):
    $$C^2 = \\frac{4}{\\hat{\\sigma}_N^2} \\times \\hat{\\theta}_N\\times\\left(1 - \\hat{\\theta}_N\\right) \\times \\left(\\hat{\\theta}_N - \\frac{1}{2}\\right)^2$$

    the estimate of the overall variance of (Schüürhuis et al., 2025, p. 5, eq. 5):
    $$\\hat{\\sigma}_N^2 = \\frac{1}{d_n}\\times\\left(\\left(\\sum_{k=1}^2 SS_k^*\\right) - n_1 \\times n_2 \\times \\left(\\hat{\\theta}_N\\times\\left(1 - \\hat{\\theta}_N\\right) - \\frac{\\hat{\\tau}_N}{4}\\right)\\right)$$
    
    the sum of squares for each of the two categories of the placement values:
    $$SS_k^* = \\sum_{i=1}^{n_k} \\left(R_{ik}^* - \\bar{R}_{k}^*\\right)^2$$

    the mean of placement values (Schüürhuis et al., 2025, p. 5):
    $$\\bar{R}_{k}^* = \\frac{\\sum_{i=1}^{n_k} R_{ik}^*}{n_k}$$

    the placement values:
    $$R_{ik}^* = R_{ik} - R_{ik}^{(k)}$$

    Denominator (Schüürhuis et al., 2025, p. 5):
    $$d_n = n_1\\times\\left(n_1 - 1\\right) \\times n_2\\times\\left(n_2 - 1\\right)$$

    the probability of ties in the overlap (Schüürhuis et al., 2025, p. 5, eq. 4):
    $$\\hat{\\tau}_N = \\frac{1}{n_1}\\times\\left(\\bar{R}_2^+ - \\bar{R}_2^- - \\left(\\bar{R}_2^{(2)+} - \\bar{R}_2^{(2)-}\\right)\\right)$$

    estimated Mann-Whitney effect (Schüürhuis et al., 2025, p. 4, eq. 1):
    $$\\hat{\\theta} = \\frac{1}{n_1} \\times \\left(\\bar{R}_2 - \\frac{n_2 + 1}{2}\\right)$$
    note that this is the same as $\\hat{p}$ in the Brunner-Munzel test.

    some means from the second category:
    $$\\bar{R}_2 = \\frac{\\sum_{i=1}^{n_2} R_{i2}}{n_2}, \\bar{R}_2^+ = \\frac{\\sum_{i=1}^{n_2} R_{i2}^+}{n_2}, \\bar{R}_2^{(2)+} = \\frac{\\sum_{i=1}^{n_2} R_{i2}^{(2)+}}{n_2}, \\bar{R}_2^{(2)-} = \\frac{\\sum_{i=1}^{n_2} R_{i2}^{(2)-}}{n_2}$$

    *Symbols used:*
    * \\(R_{ik}\\), the mid rank of the i-th score in category k, when using all combined scores
    * \\(R_{ik}^{(k)}\\), the mid-rank of the i-th score in category k, when using only scores from category k
    * \\(R_{ik}^-\\), the minimum rank of the i-th score in category k, when using all combined scores
    * \\(R_{ik}^{(k)-}\\), the minimum-rank of the i-th score in category k, when using only scores from category k
    * \\(R_{ik}^+\\), the maximum rank of the i-th score in category k, when using all combined scores
    * \\(R_{ik}^{(k)+}\\), the maximum-rank of the i-th score in category k, when using only scores from category k
    * \\(N\\), the total sample size
    * \\(n_{k}\\), the number of scores in category k

    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a cross table and/or some visualisation:
    
    * [tab_cross](../other/table_cross.html#tab_cross)
    * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)

    After the test you might want an effect size measure:
    
    * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
    * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
    * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)

    Alternatives for this test could be:
    
    * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
    * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
    * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
    * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test

    References
    ----------
    Schüürhuis, S., Konietschke, F., & Brunner, E. (2025). A new approach to the Nonparametric Behrens-Fisher problem with compatible confidence intervals (arXiv:2504.01796). arXiv. https://doi.org/10.48550/arXiv.2504.01796

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv"
    >>> df1 = pd.read_csv(file1, sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> coding = {'Far too little': 1, 'too little': 2, 'Enough': 3, 'Too much': 4, 'Far too much': 5}
    >>> ts_c_square(df1['Gen_Gender'], df1['Mix_NrAct'], levels=coding)
       var. est.  test statistic  df   p-value    categories
    0   0.003434       14.595231   1  0.000133  Male, Female
    
    '''
    # DATA PREPARATION
    #convert to pandas series if needed
    if type(bin_field) is list:
        bin_field = pd.Series(bin_field)
    
    if type(ord_field) is list:
        ord_field = pd.Series(ord_field)
    
    #combine as one dataframe
    df = pd.concat([bin_field, ord_field], axis=1)
    df = df.dropna()
    df.columns=['cat', 'score']
    
    #replace the ordinal values if levels is provided
    if levels is not None:
        df['score'] = df['score'].map(levels).astype('Int8')
    else:
        df.iloc[:,1]  = pd.to_numeric(df.iloc[:,1] )
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    # remove scores not in either of the two categories
    df = df[df['cat'].isin([cat1, cat2])]

    df['mid-rank'] = df['score'].rank()
    df['min-rank'] = df['score'].rank(method='min')
    df['max-rank'] = df['score'].rank(method='max')
    df['mid-rank_within'] = df.groupby('cat')['score'].rank()
    nk = df['cat'].value_counts()
    
    df_2 = df[df['cat'].isin([cat2])].copy()
    df_2['min-rank_within'] = df_2['score'].rank(method='min')
    df_2['max-rank_within'] = df_2['score'].rank(method='max')

    barR2 = df_2['mid-rank'].mean()
    barR2plus = df_2['max-rank'].mean()
    barR2min = df_2['min-rank'].mean()
    barR2plus2 = df_2['max-rank_within'].mean()
    barR2min2 = df_2['min-rank_within'].mean()

    thetaN = 1/nk[cat1]*(barR2 - (nk[cat2] + 1)/2)
    tauN = 1/nk[cat1]*(barR2plus - barR2min - (barR2plus2 - barR2min2))
    dn = nk[cat1]*(nk[cat1] - 1)*nk[cat2]*(nk[cat2] - 1)
    
    df['placement'] = df['mid-rank'] - df['mid-rank_within']
    SSrAst = df.groupby('cat')['placement'].var(ddof=0)*nk
    varN = 1/dn*(sum(SSrAst) - nk[cat1]*nk[cat2]*(thetaN*(1 - thetaN) - tauN/4))
    Csq = 1/varN*4*thetaN*(1 - thetaN)*(thetaN - 1/2)**2
    p_val = chi2.sf(Csq, 1)

    results = {'var. est.':varN, 'test statistic':Csq, 'df':1, 'p-value':p_val, 'categories':cat1 + ', ' + cat2}
    results_pd = pd.DataFrame([results])
    
    return results_pd