Module stikpetP.tests.test_mann_whitney
Expand source code
import pandas as pd
from scipy.stats import rankdata
from statistics import NormalDist
from ..distributions.dist_mann_whitney_wilcoxon import di_mwwcdf
def ts_mann_whitney(catField, ordField, categories=None, levels=None, method="exact", cc=True):
'''
Mann-Whitney U Test / Wilcoxon Rank Sum Test
--------------------------------------------
The Mann-Whitney U and Wilcoxon Rank Sum test are the same. Mann and Whitney simply expanded on the ideas from Wilcoxon.
The test will compare the distribution of ranks between two categories. The null hypothesis is (roughly put) that the two categories have the same mean rank (which often is stated simplified as having the same median in the population). More strickly the null hypothesis is that the probability of a randomly selected case having a score greater than a random score from the other category is 50% (Divine et al., p. 286).
This function is shown in this [YouTube video](https://youtu.be/xfI5yapvsBU) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/MannWhitneyUtest.html).
Parameters
----------
catField : pandas series
data with categories for the rows
ordField : pandas series
data with the scores (ordinal field)
categories : list or dictionary, optional
the two categories to use from catField. If not set the first two found will be used
levels : list or dictionary, optional
the scores in order
method : {"exact", "approx"}, optional
the use of the exact distribution (only if there are no ties) or the normal approximation.
cc : boolean, optional
use of continuity correction. Default is True.
Returns
-------
A dataframe with:
* *n*, the sample size
* *U1*, the Mann-Whitney U score of the first category
* *U2*, the Mann-Whitney U score of the second category
* *statistic*, the test statistic (z-value)
* *p-value*, the significance (p-value)
* *test*, description of the test used
Notes
-----
The formula used is (Mann & Whitney, 1947, p. 51):
$$U_i = R_i - \\frac{n_i\\times\\left(n_i + 1\\right)}{2}$$
With:
$$R_i = \\sum_{j=1}^{n_i} r_{i,j}$$
For an approximation the following is used:
$$sig. = 2\\times\\left(1 - Z\\left(z\\right)\\right)$$
With:
$$z = \\frac{U_i - \\frac{n_1\\times n_2}{2}}{SE}$$
$$SE = \\sqrt{\\frac{n_1\\times n_2}{n\\times\\left(n - 1\\right)}\\times\\left(\\frac{n^3 - n}{12} - \\sum_i T_i\\right)}$$
$$T_i = \\frac{t_i^3 - t_i}{12}$$
$$n = n_1 + n_2$$
If a continuity correction is used the z-value is calculated using:
$$z_{cc} = z - \\frac{0.5}{SE}$$
*Symbols used:*
* \\(n_i\\) the sample size of category i
* \\(n\\) the total sample size
* \\(r_{i,j}\\) the j-th rank of category i
The ties correction (\\(T\\)) can be found in Lehmann and D'Abrera (1975, p. 20)
For the exact distribution the Mann-Whitney-Wilcoxon distribution is used.
Wilcoxon (1945) had developed this test earlier for the case when both categories have the same sample size, and Mann and Whitney expanded on this.
Before, After and Alternatives
------------------------------
Before running the test you might first want to get an impression using a cross table and/or some visualisation:
* [tab_cross](../other/table_cross.html#tab_cross)
* [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)
After the test you might want an effect size measure:
* [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
* [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
* [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be:
* [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
* [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
* [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
* [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test
See Also
--------
stikpetP.distributions.dist_mann_whitney_wilcoxon.di_mwwcdf : the cumulative density function for the Mann-Whitney-Wilcoxon distribution
References
----------
Divine, G. W., Norton, H. J., Barón, A. E., & Juarez-Colunga, E. (2018). The Wilcoxon-Mann-Whitney procedure fails as a test of medians. *The American Statistician, 72*(3), 278–286. doi:10.1080/00031305.2017.1305291
Lehmann, E. L., & D’Abrera, H. J. M. (1975). *Nonparametrics: Statistical methods based on ranks*. Holden-Day.
Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. https://doi.org/10.1214/aoms/1177730491
Wilcoxon, F. (1945). Individual comparisons by ranking methods. *Biometrics Bulletin, 1*(6), 80. https://doi.org/10.2307/3001968
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4}
>>> ts_mann_whitney(df1['sex'], df1['accntsci'], levels=myLevels)
ties exist, swith to approximate
n U1 U2 statistic p-value test
0 954 111329.0 115575.0 0.524197 0.600142 Mann-Whitney U normal approximation, with continuity correction
>>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"]
>>> ordinal = [4, 3, 1, 6, 5, 7, 2]
>>> ts_mann_whitney(binary, ordinal)
n U1 U2 statistic p-value test
0 7 10.0 2.0 n.a. 0.228571 Mann-Whitney U exact
'''
#convert to pandas series if needed
if type(catField) is list:
catField = pd.Series(catField)
if type(ordField) is list:
ordField = pd.Series(ordField)
#combine as one dataframe
df = pd.concat([catField, ordField], axis=1)
df = df.dropna()
#replace the ordinal values if levels is provided
if levels is not None:
col_name = df.columns[1]
df[col_name] = df[col_name].map(levels).astype('Int8')
else:
df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] )
#the two categories
if categories is not None:
cat1 = categories[0]
cat2 = categories[1]
else:
cat1 = df.iloc[:,0].value_counts().index[0]
cat2 = df.iloc[:,0].value_counts().index[1]
#seperate the scores for each category
scoresCat1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
scoresCat2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
n1 = len(scoresCat1)
n2 = len(scoresCat2)
N = n1 + n2
#combine this into one long list
allScores = scoresCat1 + scoresCat2
#get the ranks
allRanks = rankdata(allScores)
#get the ranks per category
cat1Ranks = allRanks[0:n1]
cat2Ranks = allRanks[n1:N]
R1 = sum(cat1Ranks)
R2 = sum(cat2Ranks)
#The U statistics
U1 = R1 - n1 * (n1 + 1) / 2
U2 = R2 - n2 * (n2 + 1) / 2
U = min(U1, U2)
#The count of each rank
counts = pd.Series(allRanks).value_counts()
#check if ties exist
if max(counts)>1 and method=="exact":
print("ties exist, swith to approximate")
method="approx"
if method=="exact":
testUsed = "Mann-Whitney U exact"
if U2==U:
nf = n1
n1 = n2
n2 = nf
p = di_mwwcdf(int(U), n1, n2)
if U >= n1 * n2 / 2:
p = 1 - p
pValue = 2 * p
statistic = "n.a."
else:
testUsed = "Mann-Whitney U normal approximation"
T = sum(counts**3-counts)/12
SE = (n1 * n2 / (N * (N - 1)) * ((N**3 - N) / 12 - T)) ** 0.5
Z1 = (2 * U1 - n1 * n2) / (2 * SE)
Z2 = (2 * U2 - n1 * n2) / (2 * SE)
zabs = abs(Z1)
if cc:
zabs = zabs - 0.5/SE
testUsed = "Mann-Whitney U normal approximation, with continuity correction"
#still need abs since cc could make it negative
pValue = 2 * (1 - NormalDist().cdf(abs(zabs)))
statistic = zabs
#the results
results = pd.DataFrame([[N, U1, U2, statistic, pValue, testUsed]], columns=["n", "U1", "U2", "statistic", "p-value", "test"])
return results
Functions
def ts_mann_whitney(catField, ordField, categories=None, levels=None, method='exact', cc=True)-
Mann-Whitney U Test / Wilcoxon Rank Sum Test
The Mann-Whitney U and Wilcoxon Rank Sum test are the same. Mann and Whitney simply expanded on the ideas from Wilcoxon.
The test will compare the distribution of ranks between two categories. The null hypothesis is (roughly put) that the two categories have the same mean rank (which often is stated simplified as having the same median in the population). More strickly the null hypothesis is that the probability of a randomly selected case having a score greater than a random score from the other category is 50% (Divine et al., p. 286).
This function is shown in this YouTube video and the test is also described at PeterStatistics.com.
Parameters
catField:pandas series- data with categories for the rows
ordField:pandas series- data with the scores (ordinal field)
categories:listordictionary, optional- the two categories to use from catField. If not set the first two found will be used
levels:listordictionary, optional- the scores in order
method:{"exact", "approx"}, optional- the use of the exact distribution (only if there are no ties) or the normal approximation.
cc:boolean, optional- use of continuity correction. Default is True.
Returns
A dataframe with:
- n, the sample size
- U1, the Mann-Whitney U score of the first category
- U2, the Mann-Whitney U score of the second category
- statistic, the test statistic (z-value)
- p-value, the significance (p-value)
- test, description of the test used
Notes
The formula used is (Mann & Whitney, 1947, p. 51): U_i = R_i - \frac{n_i\times\left(n_i + 1\right)}{2}
With: R_i = \sum_{j=1}^{n_i} r_{i,j}
For an approximation the following is used: sig. = 2\times\left(1 - Z\left(z\right)\right)
With: z = \frac{U_i - \frac{n_1\times n_2}{2}}{SE} SE = \sqrt{\frac{n_1\times n_2}{n\times\left(n - 1\right)}\times\left(\frac{n^3 - n}{12} - \sum_i T_i\right)} T_i = \frac{t_i^3 - t_i}{12} n = n_1 + n_2
If a continuity correction is used the z-value is calculated using: z_{cc} = z - \frac{0.5}{SE}
Symbols used:
- n_i the sample size of category i
- n the total sample size
- r_{i,j} the j-th rank of category i
The ties correction ((T)) can be found in Lehmann and D'Abrera (1975, p. 20)
For the exact distribution the Mann-Whitney-Wilcoxon distribution is used.
Wilcoxon (1945) had developed this test earlier for the case when both categories have the same sample size, and Mann and Whitney expanded on this.
Before, After and Alternatives
Before running the test you might first want to get an impression using a cross table and/or some visualisation:
After the test you might want an effect size measure:
- es_common_language_is for Common Language Effect Size
- me_hodges_lehmann_is for Hodges-Lehmann estimate
- r_rank_biserial_is for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be:
- ts_fligner_policello for the Fligner-Policello test
- ts_brunner_munzel for the Brunner-Munzel test
- ts_brunner_munzel_perm for the Brunner-Munzel Permutation test
- ts_c_square for the C^2 test
See Also
di_mwwcdf()- the cumulative density function for the Mann-Whitney-Wilcoxon distribution
References
Divine, G. W., Norton, H. J., Barón, A. E., & Juarez-Colunga, E. (2018). The Wilcoxon-Mann-Whitney procedure fails as a test of medians. The American Statistician, 72(3), 278–286. doi:10.1080/00031305.2017.1305291
Lehmann, E. L., & D’Abrera, H. J. M. (1975). Nonparametrics: Statistical methods based on ranks. Holden-Day.
Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1), 50–60. https://doi.org/10.1214/aoms/1177730491
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80. https://doi.org/10.2307/3001968
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4} >>> ts_mann_whitney(df1['sex'], df1['accntsci'], levels=myLevels) ties exist, swith to approximate n U1 U2 statistic p-value test 0 954 111329.0 115575.0 0.524197 0.600142 Mann-Whitney U normal approximation, with continuity correction>>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"] >>> ordinal = [4, 3, 1, 6, 5, 7, 2] >>> ts_mann_whitney(binary, ordinal) n U1 U2 statistic p-value test 0 7 10.0 2.0 n.a. 0.228571 Mann-Whitney U exactExpand source code
def ts_mann_whitney(catField, ordField, categories=None, levels=None, method="exact", cc=True): ''' Mann-Whitney U Test / Wilcoxon Rank Sum Test -------------------------------------------- The Mann-Whitney U and Wilcoxon Rank Sum test are the same. Mann and Whitney simply expanded on the ideas from Wilcoxon. The test will compare the distribution of ranks between two categories. The null hypothesis is (roughly put) that the two categories have the same mean rank (which often is stated simplified as having the same median in the population). More strickly the null hypothesis is that the probability of a randomly selected case having a score greater than a random score from the other category is 50% (Divine et al., p. 286). This function is shown in this [YouTube video](https://youtu.be/xfI5yapvsBU) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/MannWhitneyUtest.html). Parameters ---------- catField : pandas series data with categories for the rows ordField : pandas series data with the scores (ordinal field) categories : list or dictionary, optional the two categories to use from catField. If not set the first two found will be used levels : list or dictionary, optional the scores in order method : {"exact", "approx"}, optional the use of the exact distribution (only if there are no ties) or the normal approximation. cc : boolean, optional use of continuity correction. Default is True. Returns ------- A dataframe with: * *n*, the sample size * *U1*, the Mann-Whitney U score of the first category * *U2*, the Mann-Whitney U score of the second category * *statistic*, the test statistic (z-value) * *p-value*, the significance (p-value) * *test*, description of the test used Notes ----- The formula used is (Mann & Whitney, 1947, p. 51): $$U_i = R_i - \\frac{n_i\\times\\left(n_i + 1\\right)}{2}$$ With: $$R_i = \\sum_{j=1}^{n_i} r_{i,j}$$ For an approximation the following is used: $$sig. = 2\\times\\left(1 - Z\\left(z\\right)\\right)$$ With: $$z = \\frac{U_i - \\frac{n_1\\times n_2}{2}}{SE}$$ $$SE = \\sqrt{\\frac{n_1\\times n_2}{n\\times\\left(n - 1\\right)}\\times\\left(\\frac{n^3 - n}{12} - \\sum_i T_i\\right)}$$ $$T_i = \\frac{t_i^3 - t_i}{12}$$ $$n = n_1 + n_2$$ If a continuity correction is used the z-value is calculated using: $$z_{cc} = z - \\frac{0.5}{SE}$$ *Symbols used:* * \\(n_i\\) the sample size of category i * \\(n\\) the total sample size * \\(r_{i,j}\\) the j-th rank of category i The ties correction (\\(T\\)) can be found in Lehmann and D'Abrera (1975, p. 20) For the exact distribution the Mann-Whitney-Wilcoxon distribution is used. Wilcoxon (1945) had developed this test earlier for the case when both categories have the same sample size, and Mann and Whitney expanded on this. Before, After and Alternatives ------------------------------ Before running the test you might first want to get an impression using a cross table and/or some visualisation: * [tab_cross](../other/table_cross.html#tab_cross) * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple) After the test you might want an effect size measure: * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta) Alternatives for this test could be: * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test See Also -------- stikpetP.distributions.dist_mann_whitney_wilcoxon.di_mwwcdf : the cumulative density function for the Mann-Whitney-Wilcoxon distribution References ---------- Divine, G. W., Norton, H. J., Barón, A. E., & Juarez-Colunga, E. (2018). The Wilcoxon-Mann-Whitney procedure fails as a test of medians. *The American Statistician, 72*(3), 278–286. doi:10.1080/00031305.2017.1305291 Lehmann, E. L., & D’Abrera, H. J. M. (1975). *Nonparametrics: Statistical methods based on ranks*. Holden-Day. Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. https://doi.org/10.1214/aoms/1177730491 Wilcoxon, F. (1945). Individual comparisons by ranking methods. *Biometrics Bulletin, 1*(6), 80. https://doi.org/10.2307/3001968 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4} >>> ts_mann_whitney(df1['sex'], df1['accntsci'], levels=myLevels) ties exist, swith to approximate n U1 U2 statistic p-value test 0 954 111329.0 115575.0 0.524197 0.600142 Mann-Whitney U normal approximation, with continuity correction >>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"] >>> ordinal = [4, 3, 1, 6, 5, 7, 2] >>> ts_mann_whitney(binary, ordinal) n U1 U2 statistic p-value test 0 7 10.0 2.0 n.a. 0.228571 Mann-Whitney U exact ''' #convert to pandas series if needed if type(catField) is list: catField = pd.Series(catField) if type(ordField) is list: ordField = pd.Series(ordField) #combine as one dataframe df = pd.concat([catField, ordField], axis=1) df = df.dropna() #replace the ordinal values if levels is provided if levels is not None: col_name = df.columns[1] df[col_name] = df[col_name].map(levels).astype('Int8') else: df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] ) #the two categories if categories is not None: cat1 = categories[0] cat2 = categories[1] else: cat1 = df.iloc[:,0].value_counts().index[0] cat2 = df.iloc[:,0].value_counts().index[1] #seperate the scores for each category scoresCat1 = list(df.iloc[:,1][df.iloc[:,0] == cat1]) scoresCat2 = list(df.iloc[:,1][df.iloc[:,0] == cat2]) n1 = len(scoresCat1) n2 = len(scoresCat2) N = n1 + n2 #combine this into one long list allScores = scoresCat1 + scoresCat2 #get the ranks allRanks = rankdata(allScores) #get the ranks per category cat1Ranks = allRanks[0:n1] cat2Ranks = allRanks[n1:N] R1 = sum(cat1Ranks) R2 = sum(cat2Ranks) #The U statistics U1 = R1 - n1 * (n1 + 1) / 2 U2 = R2 - n2 * (n2 + 1) / 2 U = min(U1, U2) #The count of each rank counts = pd.Series(allRanks).value_counts() #check if ties exist if max(counts)>1 and method=="exact": print("ties exist, swith to approximate") method="approx" if method=="exact": testUsed = "Mann-Whitney U exact" if U2==U: nf = n1 n1 = n2 n2 = nf p = di_mwwcdf(int(U), n1, n2) if U >= n1 * n2 / 2: p = 1 - p pValue = 2 * p statistic = "n.a." else: testUsed = "Mann-Whitney U normal approximation" T = sum(counts**3-counts)/12 SE = (n1 * n2 / (N * (N - 1)) * ((N**3 - N) / 12 - T)) ** 0.5 Z1 = (2 * U1 - n1 * n2) / (2 * SE) Z2 = (2 * U2 - n1 * n2) / (2 * SE) zabs = abs(Z1) if cc: zabs = zabs - 0.5/SE testUsed = "Mann-Whitney U normal approximation, with continuity correction" #still need abs since cc could make it negative pValue = 2 * (1 - NormalDist().cdf(abs(zabs))) statistic = zabs #the results results = pd.DataFrame([[N, U1, U2, statistic, pValue, testUsed]], columns=["n", "U1", "U2", "statistic", "p-value", "test"]) return results