Module `stikpetP.tests.test_mann_whitney`

Expand source code

import pandas as pd
from scipy.stats import rankdata
from statistics import NormalDist
from ..distributions.dist_mann_whitney_wilcoxon import di_mwwcdf

def ts_mann_whitney(catField, ordField, categories=None, levels=None, method="exact", cc=True):
    '''
    Mann-Whitney U Test / Wilcoxon Rank Sum Test
    --------------------------------------------
    The Mann-Whitney U and Wilcoxon Rank Sum test are the same. Mann and Whitney simply expanded on the ideas from Wilcoxon.
    
    The test will compare the distribution of ranks between two categories. The null hypothesis is (roughly put) that the two categories have the same mean rank (which often is stated simplified as having the same median in the population). More strickly the null hypothesis is that the probability of a randomly selected case having a score greater than a random score from the other category is 50% (Divine et al., p. 286).

    This function is shown in this [YouTube video](https://youtu.be/xfI5yapvsBU) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/MannWhitneyUtest.html).
    
    Parameters
    ----------
    catField : pandas series
        data with categories for the rows
    ordField : pandas series
        data with the scores (ordinal field)
    categories : list or dictionary, optional
        the two categories to use from catField. If not set the first two found will be used
    levels : list or dictionary, optional
        the scores in order
    method : {"exact", "approx"}, optional
        the use of the exact distribution (only if there are no ties) or the normal approximation.
    cc : boolean, optional
        use of continuity correction. Default is True.

    Returns
    -------
    A dataframe with:
    
    * *n*, the sample size
    * *U1*, the Mann-Whitney U score of the first category
    * *U2*, the Mann-Whitney U score of the second category
    * *statistic*, the test statistic (z-value)
    * *p-value*, the significance (p-value)
    * *test*, description of the test used
    
    Notes
    -----
    The formula used is (Mann & Whitney, 1947, p. 51):
    $$U_i = R_i - \\frac{n_i\\times\\left(n_i + 1\\right)}{2}$$
    
    With:
    $$R_i = \\sum_{j=1}^{n_i} r_{i,j}$$
    
    For an approximation the following is used:
    $$sig. = 2\\times\\left(1 - Z\\left(z\\right)\\right)$$
    
    With:
    $$z = \\frac{U_i - \\frac{n_1\\times n_2}{2}}{SE}$$
    $$SE = \\sqrt{\\frac{n_1\\times n_2}{n\\times\\left(n - 1\\right)}\\times\\left(\\frac{n^3 - n}{12} - \\sum_i T_i\\right)}$$
    $$T_i = \\frac{t_i^3 - t_i}{12}$$
    $$n = n_1 + n_2$$
    
    If a continuity correction is used the z-value is calculated using:
    $$z_{cc} = z - \\frac{0.5}{SE}$$
    
    *Symbols used:*
    
    * \\(n_i\\) the sample size of category i
    * \\(n\\) the total sample size
    * \\(r_{i,j}\\) the j-th rank of category i
    
    The ties correction (\\(T\\)) can be found in Lehmann and D'Abrera (1975, p. 20)
    
    For the exact distribution the Mann-Whitney-Wilcoxon distribution is used.
    
    Wilcoxon (1945) had developed this test earlier for the case when both categories have the same sample size, and Mann and Whitney expanded on this.

    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a cross table and/or some visualisation:
    
    * [tab_cross](../other/table_cross.html#tab_cross)
    * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)

    After the test you might want an effect size measure:
    
    * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
    * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
    * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)

    Alternatives for this test could be:
    
    * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
    * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
    * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
    * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test
    
    See Also
    --------
    stikpetP.distributions.dist_mann_whitney_wilcoxon.di_mwwcdf : the cumulative density function for the Mann-Whitney-Wilcoxon distribution
    
    References
    ----------
    Divine, G. W., Norton, H. J., Barón, A. E., & Juarez-Colunga, E. (2018). The Wilcoxon-Mann-Whitney procedure fails as a test of medians. *The American Statistician, 72*(3), 278–286. doi:10.1080/00031305.2017.1305291
    
    Lehmann, E. L., & D’Abrera, H. J. M. (1975). *Nonparametrics: Statistical methods based on ranks*. Holden-Day.
    
    Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. https://doi.org/10.1214/aoms/1177730491
    
    Wilcoxon, F. (1945). Individual comparisons by ranking methods. *Biometrics Bulletin, 1*(6), 80. https://doi.org/10.2307/3001968
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4}
    >>> ts_mann_whitney(df1['sex'], df1['accntsci'], levels=myLevels)
    ties exist, swith to approximate
         n        U1        U2  statistic   p-value                                                             test
    0  954  111329.0  115575.0   0.524197  0.600142  Mann-Whitney U normal approximation, with continuity correction

    >>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"]
    >>> ordinal = [4, 3, 1, 6, 5, 7, 2]
    >>> ts_mann_whitney(binary, ordinal)
       n    U1   U2 statistic   p-value                  test
    0  7  10.0  2.0      n.a.  0.228571  Mann-Whitney U exact
    '''
    
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(ordField) is list:
        ordField = pd.Series(ordField)
    
    #combine as one dataframe
    df = pd.concat([catField, ordField], axis=1)
    df = df.dropna()
    
    #replace the ordinal values if levels is provided
    if levels is not None:
        col_name = df.columns[1]
        df[col_name] = df[col_name].map(levels).astype('Int8')
    else:
        df.iloc[:,1]  = pd.to_numeric(df.iloc[:,1] )
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    scoresCat1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    scoresCat2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    n1 = len(scoresCat1)
    n2 = len(scoresCat2)
    N = n1 + n2
    
    #combine this into one long list
    allScores = scoresCat1 + scoresCat2
    
    #get the ranks
    allRanks = rankdata(allScores)
    
    #get the ranks per category
    cat1Ranks = allRanks[0:n1]
    cat2Ranks = allRanks[n1:N]
    
    R1 = sum(cat1Ranks)
    R2 = sum(cat2Ranks)
    
    #The U statistics
    U1 = R1 - n1 * (n1 + 1) / 2
    U2 = R2 - n2 * (n2 + 1) / 2
    U = min(U1, U2)
    
    #The count of each rank
    counts = pd.Series(allRanks).value_counts()
    
    #check if ties exist
    if max(counts)>1 and method=="exact":
        print("ties exist, swith to approximate")
        method="approx"
    
    if method=="exact":
        testUsed = "Mann-Whitney U exact"
        if U2==U:
            nf = n1
            n1 = n2
            n2 = nf 
        
        p = di_mwwcdf(int(U), n1, n2)
        
        if U >= n1 * n2 / 2:
            p = 1 - p
        
        pValue = 2 * p        
        statistic = "n.a."
    else:    
        testUsed = "Mann-Whitney U normal approximation"
        T = sum(counts**3-counts)/12

        SE = (n1 * n2 / (N * (N - 1)) * ((N**3 - N) / 12 - T)) ** 0.5

        Z1 = (2 * U1 - n1 * n2) / (2 * SE)
        Z2 = (2 * U2 - n1 * n2) / (2 * SE)
        
        zabs = abs(Z1)
        
        if cc:
            zabs = zabs - 0.5/SE
            testUsed = "Mann-Whitney U normal approximation, with continuity correction"
        
        #still need abs since cc could make it negative
        pValue = 2 * (1 - NormalDist().cdf(abs(zabs))) 
        statistic = zabs
        
    #the results
    results = pd.DataFrame([[N, U1, U2, statistic, pValue, testUsed]], columns=["n", "U1", "U2", "statistic", "p-value", "test"])
    
    return results

Functions

def ts_mann_whitney(catField, ordField, categories=None, levels=None, method='exact', cc=True)

Mann-Whitney U Test / Wilcoxon Rank Sum Test

The Mann-Whitney U and Wilcoxon Rank Sum test are the same. Mann and Whitney simply expanded on the ideas from Wilcoxon.

The test will compare the distribution of ranks between two categories. The null hypothesis is (roughly put) that the two categories have the same mean rank (which often is stated simplified as having the same median in the population). More strickly the null hypothesis is that the probability of a randomly selected case having a score greater than a random score from the other category is 50% (Divine et al., p. 286).

This function is shown in this YouTube video and the test is also described at PeterStatistics.com.

Parameters

catField : pandas series: data with categories for the rows
ordField : pandas series: data with the scores (ordinal field)
categories : list or dictionary, optional: the two categories to use from catField. If not set the first two found will be used
levels : list or dictionary, optional: the scores in order
method : {"exact", "approx"}, optional: the use of the exact distribution (only if there are no ties) or the normal approximation.
cc : boolean, optional: use of continuity correction. Default is True.

Returns

A dataframe with:

n, the sample size
U1, the Mann-Whitney U score of the first category
U2, the Mann-Whitney U score of the second category
statistic, the test statistic (z-value)
p-value, the significance (p-value)
test, description of the test used

Notes

The formula used is (Mann & Whitney, 1947, p. 51): $U_i = R_i - \frac{n_i\times\left(n_i + 1\right)}{2}$

With: $R_i = \sum_{j=1}^{n_i} r_{i,j}$

For an approximation the following is used: $sig. = 2\times\left(1 - Z\left(z\right)\right)$

With: $z = \frac{U_i - \frac{n_1\times n_2}{2}}{SE}$ $SE = \sqrt{\frac{n_1\times n_2}{n\times\left(n - 1\right)}\times\left(\frac{n^3 - n}{12} - \sum_i T_i\right)}$ $T_i = \frac{t_i^3 - t_i}{12}$ $n = n_1 + n_2$

If a continuity correction is used the z-value is calculated using: $z_{cc} = z - \frac{0.5}{SE}$

Symbols used:

$n_i$ the sample size of category i
$n$ the total sample size
$r_{i,j}$ the j-th rank of category i

The ties correction ((T)) can be found in Lehmann and D'Abrera (1975, p. 20)

For the exact distribution the Mann-Whitney-Wilcoxon distribution is used.

Wilcoxon (1945) had developed this test earlier for the case when both categories have the same sample size, and Mann and Whitney expanded on this.

Before, After and Alternatives

Before running the test you might first want to get an impression using a cross table and/or some visualisation:

After the test you might want an effect size measure:

es_common_language_is for Common Language Effect Size
me_hodges_lehmann_is for Hodges-Lehmann estimate
r_rank_biserial_is for (Glass) Rank Biserial (Cliff delta)

Alternatives for this test could be:

ts_fligner_policello for the Fligner-Policello test
ts_brunner_munzel for the Brunner-Munzel test
ts_brunner_munzel_perm for the Brunner-Munzel Permutation test
ts_c_square for the $C^2$ test

References

Divine, G. W., Norton, H. J., Barón, A. E., & Juarez-Colunga, E. (2018). The Wilcoxon-Mann-Whitney procedure fails as a test of medians. The American Statistician, 72(3), 278–286. doi:10.1080/00031305.2017.1305291

Lehmann, E. L., & D’Abrera, H. J. M. (1975). Nonparametrics: Statistical methods based on ranks. Holden-Day.

Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1), 50–60. https://doi.org/10.1214/aoms/1177730491

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80. https://doi.org/10.2307/3001968

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4}
>>> ts_mann_whitney(df1['sex'], df1['accntsci'], levels=myLevels)
ties exist, swith to approximate
     n        U1        U2  statistic   p-value                                                             test
0  954  111329.0  115575.0   0.524197  0.600142  Mann-Whitney U normal approximation, with continuity correction

>>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"]
>>> ordinal = [4, 3, 1, 6, 5, 7, 2]
>>> ts_mann_whitney(binary, ordinal)
   n    U1   U2 statistic   p-value                  test
0  7  10.0  2.0      n.a.  0.228571  Mann-Whitney U exact

Expand source code

def ts_mann_whitney(catField, ordField, categories=None, levels=None, method="exact", cc=True):
    '''
    Mann-Whitney U Test / Wilcoxon Rank Sum Test
    --------------------------------------------
    The Mann-Whitney U and Wilcoxon Rank Sum test are the same. Mann and Whitney simply expanded on the ideas from Wilcoxon.
    
    The test will compare the distribution of ranks between two categories. The null hypothesis is (roughly put) that the two categories have the same mean rank (which often is stated simplified as having the same median in the population). More strickly the null hypothesis is that the probability of a randomly selected case having a score greater than a random score from the other category is 50% (Divine et al., p. 286).

    This function is shown in this [YouTube video](https://youtu.be/xfI5yapvsBU) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/MannWhitneyUtest.html).
    
    Parameters
    ----------
    catField : pandas series
        data with categories for the rows
    ordField : pandas series
        data with the scores (ordinal field)
    categories : list or dictionary, optional
        the two categories to use from catField. If not set the first two found will be used
    levels : list or dictionary, optional
        the scores in order
    method : {"exact", "approx"}, optional
        the use of the exact distribution (only if there are no ties) or the normal approximation.
    cc : boolean, optional
        use of continuity correction. Default is True.

    Returns
    -------
    A dataframe with:
    
    * *n*, the sample size
    * *U1*, the Mann-Whitney U score of the first category
    * *U2*, the Mann-Whitney U score of the second category
    * *statistic*, the test statistic (z-value)
    * *p-value*, the significance (p-value)
    * *test*, description of the test used
    
    Notes
    -----
    The formula used is (Mann & Whitney, 1947, p. 51):
    $$U_i = R_i - \\frac{n_i\\times\\left(n_i + 1\\right)}{2}$$
    
    With:
    $$R_i = \\sum_{j=1}^{n_i} r_{i,j}$$
    
    For an approximation the following is used:
    $$sig. = 2\\times\\left(1 - Z\\left(z\\right)\\right)$$
    
    With:
    $$z = \\frac{U_i - \\frac{n_1\\times n_2}{2}}{SE}$$
    $$SE = \\sqrt{\\frac{n_1\\times n_2}{n\\times\\left(n - 1\\right)}\\times\\left(\\frac{n^3 - n}{12} - \\sum_i T_i\\right)}$$
    $$T_i = \\frac{t_i^3 - t_i}{12}$$
    $$n = n_1 + n_2$$
    
    If a continuity correction is used the z-value is calculated using:
    $$z_{cc} = z - \\frac{0.5}{SE}$$
    
    *Symbols used:*
    
    * \\(n_i\\) the sample size of category i
    * \\(n\\) the total sample size
    * \\(r_{i,j}\\) the j-th rank of category i
    
    The ties correction (\\(T\\)) can be found in Lehmann and D'Abrera (1975, p. 20)
    
    For the exact distribution the Mann-Whitney-Wilcoxon distribution is used.
    
    Wilcoxon (1945) had developed this test earlier for the case when both categories have the same sample size, and Mann and Whitney expanded on this.

    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a cross table and/or some visualisation:
    
    * [tab_cross](../other/table_cross.html#tab_cross)
    * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)

    After the test you might want an effect size measure:
    
    * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
    * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
    * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)

    Alternatives for this test could be:
    
    * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
    * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
    * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
    * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test
    
    See Also
    --------
    stikpetP.distributions.dist_mann_whitney_wilcoxon.di_mwwcdf : the cumulative density function for the Mann-Whitney-Wilcoxon distribution
    
    References
    ----------
    Divine, G. W., Norton, H. J., Barón, A. E., & Juarez-Colunga, E. (2018). The Wilcoxon-Mann-Whitney procedure fails as a test of medians. *The American Statistician, 72*(3), 278–286. doi:10.1080/00031305.2017.1305291
    
    Lehmann, E. L., & D’Abrera, H. J. M. (1975). *Nonparametrics: Statistical methods based on ranks*. Holden-Day.
    
    Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. https://doi.org/10.1214/aoms/1177730491
    
    Wilcoxon, F. (1945). Individual comparisons by ranking methods. *Biometrics Bulletin, 1*(6), 80. https://doi.org/10.2307/3001968
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4}
    >>> ts_mann_whitney(df1['sex'], df1['accntsci'], levels=myLevels)
    ties exist, swith to approximate
         n        U1        U2  statistic   p-value                                                             test
    0  954  111329.0  115575.0   0.524197  0.600142  Mann-Whitney U normal approximation, with continuity correction

    >>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"]
    >>> ordinal = [4, 3, 1, 6, 5, 7, 2]
    >>> ts_mann_whitney(binary, ordinal)
       n    U1   U2 statistic   p-value                  test
    0  7  10.0  2.0      n.a.  0.228571  Mann-Whitney U exact
    '''
    
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(ordField) is list:
        ordField = pd.Series(ordField)
    
    #combine as one dataframe
    df = pd.concat([catField, ordField], axis=1)
    df = df.dropna()
    
    #replace the ordinal values if levels is provided
    if levels is not None:
        col_name = df.columns[1]
        df[col_name] = df[col_name].map(levels).astype('Int8')
    else:
        df.iloc[:,1]  = pd.to_numeric(df.iloc[:,1] )
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    scoresCat1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    scoresCat2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    n1 = len(scoresCat1)
    n2 = len(scoresCat2)
    N = n1 + n2
    
    #combine this into one long list
    allScores = scoresCat1 + scoresCat2
    
    #get the ranks
    allRanks = rankdata(allScores)
    
    #get the ranks per category
    cat1Ranks = allRanks[0:n1]
    cat2Ranks = allRanks[n1:N]
    
    R1 = sum(cat1Ranks)
    R2 = sum(cat2Ranks)
    
    #The U statistics
    U1 = R1 - n1 * (n1 + 1) / 2
    U2 = R2 - n2 * (n2 + 1) / 2
    U = min(U1, U2)
    
    #The count of each rank
    counts = pd.Series(allRanks).value_counts()
    
    #check if ties exist
    if max(counts)>1 and method=="exact":
        print("ties exist, swith to approximate")
        method="approx"
    
    if method=="exact":
        testUsed = "Mann-Whitney U exact"
        if U2==U:
            nf = n1
            n1 = n2
            n2 = nf 
        
        p = di_mwwcdf(int(U), n1, n2)
        
        if U >= n1 * n2 / 2:
            p = 1 - p
        
        pValue = 2 * p        
        statistic = "n.a."
    else:    
        testUsed = "Mann-Whitney U normal approximation"
        T = sum(counts**3-counts)/12

        SE = (n1 * n2 / (N * (N - 1)) * ((N**3 - N) / 12 - T)) ** 0.5

        Z1 = (2 * U1 - n1 * n2) / (2 * SE)
        Z2 = (2 * U2 - n1 * n2) / (2 * SE)
        
        zabs = abs(Z1)
        
        if cc:
            zabs = zabs - 0.5/SE
            testUsed = "Mann-Whitney U normal approximation, with continuity correction"
        
        #still need abs since cc could make it negative
        pValue = 2 * (1 - NormalDist().cdf(abs(zabs))) 
        statistic = zabs
        
    #the results
    results = pd.DataFrame([[N, U1, U2, statistic, pValue, testUsed]], columns=["n", "U1", "U2", "statistic", "p-value", "test"])
    
    return results