Module stikpetP.tests.test_brunner_munzel
Expand source code
import pandas as pd
from scipy.stats import norm, t
def ts_brunner_munzel(bin_field, ord_field, categories=None, levels=None, distribution='t'):
'''
Brunner-Munzel Test
--------------------------------------------
The Brunner-Munzel test.
The Brunner-Munzel test, tests so-called stochastic dominance. It tests if you pick a random score from the first category, will there be a higher (or lower) chance than 50% it will be higher than a random score from the other category. Note this is not the same as a median test. A very short example.
If we have the scores 40, 50 and 60 in group A, and the scores 30, 50, and 51 in group B, the median of each group is 50. However if we pick 40 from group A, there is a 1/3 chance it is higher than a value from group B. If we pick 50 there is also a 1/3 chance, and if we pick 60 there is a 3/3 chance. Together, there is a (1/3+1/3+3/3)/3 = 5/9 chance that if you pick a random value from group A and B, that the one in group A is higher. This is quite off from the 50%.
The test could therefor be used if we have a binary and an ordinal variable. The Mann-Whitney U test is probably the most famous one used in these scenarios, but actually has quite some limitations, and this Brunner-Munzel test might be better suited in many cases. The Fligner-Policello test is another test that is sometimes used.
Brunner and Munzel (p. 21) indicate the test-statistic that is computed follows a standard normal distribution, if each category has 50 or more data points. They also remark (p. 22) that the test is no longer accurate if sample sizes are less than 10, although in Schüürhuis et al. (2025, p. 18) 15 is listed. Neubert and Brunner (2007) propose to use a studentized permutation test in these cases. Schüürhuis et al. (2025) developed an improved version of this test as well, called a \(C^2\) test.
This function is shown in this [YouTube video](https://youtu.be/qwugQCbXtnM) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/BrunnerMunzelTest.html).
Parameters
----------
bin_field : pandas series
data with categories for the rows
ord_field : pandas series
data with the scores (ordinal field)
categories : list or dictionary, optional
the two categories to use from bin_field. If not set the first two found will be used
levels : list or dictionary, optional
the scores in order
distribution : {"t", "z"}, optional
the use of the student t-distribution or the standard normal z-distribution.
Returns
-------
A dataframe with:
* *var. est.*, the estimated overall variance
* *min n*, the sample size of the category with the lowest count
* *test-statistic*, the Brunner-Munzel statistic
* *df*, the degrees of freedom
* *p-value*, the significance (p-value), two-tailed
* *categories*, the two categories that were used
* *test used*, description of the test used and distribution.
Notes
-----
The test-statistic is calculated using (Brunner & Munzel, 2000, p. 21, eq. 4.8):
$$W_n^{BM} = \\frac{\\bar{R}_2 - \\bar{R}_1}{\\sqrt{N \\times \\hat{\\sigma}_N^2}}$$
with degrees of freedom (Brunner & Munzel, 2000, p. 22):
$$df = \\frac{\\hat{\\sigma}_N^4}{N^2 \\times \\sum_{k=1}^2 \\frac{\\hat{\\sigma}_k^4}{\\left(n_k - 1\\right)\\times n_k^2}}$$
The total estimated variance (Brunner & Munzel, 2000, p. 21, eq. 4.7):
$$\\hat{\\sigma}_N^2 = N\\times\\left(\\frac{\\hat{\\sigma}_1^2}{n_1} + \\frac{\\hat{\\sigma}_2^2}{n_2}\\right)$$
The estimated variance for each category (Brunner & Munzel, 2000, p. 20, eq. 4.6):
$$\\hat{\\sigma}_k^2 = \\frac{s_{R_k^*}^2}{\\left(N - n_k\\right)^2}$$
the sample variance of placement values (Brunner & Munzel, 2000, p. 20, eq. 4.5):
$$s_{R_k^*}^2 = \\frac{\\sum_{i=1}^{n_k} \\left(R_{ik}^* - \\bar{R}_{k}^*\\right)^2}{n_k - 1}$$
mean of the placement values:
$$\\bar{R}_{k}^* = \\frac{\\sum_{i=1}^{n_k} R_{ik}^*}{n_k}$$
the placement values:
$$R_{ik}^* = R_{ik} - R_{ik}^{(k)}$$
mean of the pooled ranks for each category:
$$\\bar{R}_k = \\frac{\\sum_{i=1}^{n_k} R_{ik}}{n_k}$$
*Symbols used:*
* \\(R_{ik}\\), the rank of the i-th score in category k, when using all combined scores
* \\(R_{ik}^{(k)}\\), the rank of the i-th score in category k, when using only scores from category k
* \\(N\\), the total sample size
* \\(n_{k}\\), the number of scores in category k
Before, After and Alternatives
------------------------------
Before running the test you might first want to get an impression using a cross table and/or some visualisation:
* [tab_cross](../other/table_cross.html#tab_cross)
* [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)
After the test you might want an effect size measure:
* [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
* [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
* [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be:
* [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
* [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
* [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
* [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test
References
----------
Brunner, E., & Munzel, U. (2000). The nonparametric Behrens-Fisher problem: Asymptotic theory and a small-sample approximation. *Biometrical Journal, 42*(1), 17–25. https://doi.org/10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U
Neubert, K., & Brunner, E. (2007). A studentized permutation test for the non-parametric Behrens–Fisher problem. *Computational Statistics & Data Analysis, 51*(10), 5192–5204. https://doi.org/10.1016/j.csda.2006.05.024
Schüürhuis, S., Konietschke, F., & Brunner, E. (2025). A new approach to the Nonparametric Behrens-Fisher problem with compatible confidence intervals (arXiv:2504.01796). arXiv. https://doi.org/10.48550/arXiv.2504.01796
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> pd.set_option('display.width',1000)
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv"
>>> df1 = pd.read_csv(file1, sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> coding = {'Far too little': 1, 'too little': 2, 'Enough': 3, 'Too much': 4, 'Far too much': 5}
>>> ts_brunner_munzel(df1['Gen_Gender'], df1['Mix_NrAct'], levels=coding)
var. est. min n test statistic df p-value categories test used
0 0.156508 11 4.465838 36.980916 0.000073 Male, Female Brunner-Munzel with t distribution
'''
# DATA PREPARATION
#convert to pandas series if needed
if type(bin_field) is list:
bin_field = pd.Series(bin_field)
if type(ord_field) is list:
ord_field = pd.Series(ord_field)
#combine as one dataframe
df = pd.concat([bin_field, ord_field], axis=1)
df = df.dropna()
df.columns=['cat', 'score']
#replace the ordinal values if levels is provided
if levels is not None:
df['score'] = df['score'].map(levels).astype('Int8')
else:
df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] )
#the two categories
if categories is not None:
cat1 = categories[0]
cat2 = categories[1]
else:
cat1 = df.iloc[:,0].value_counts().index[0]
cat2 = df.iloc[:,0].value_counts().index[1]
# remove scores not in either of the two categories
df = df[df['cat'].isin([cat1, cat2])]
# RANKING
# get the mid-ranks of the pooled scores (R_ik)
df['rank'] = df['score'].rank()
# the mid-ranks within each category (R_ik^(k))
df['rank_within'] = df.groupby('cat')['score'].rank()
# the placement scores (R_ik^*)
df['placement'] = df['rank'] - df['rank_within']
# TEST
# the unbiased variances for each category (s_Rk*^2)
unbVarRast = df.groupby('cat')['placement'].var(ddof=1)
# the counts
nk = df['cat'].value_counts()
# the overall sample size
N = sum(nk)
# estimated variance for each category (sigma_k^2)
vark = unbVarRast / (N - nk)**2
# the overall estimated variance
varN = N*sum(vark/nk)
# the mean mid-rank for each category (based on pooled ranks)
barRk = df.groupby('cat')['rank'].mean()
# the test statistic
W = (barRk.iloc[1] - barRk.iloc[0])/(N*varN)**0.5
# minimum n
min_n = min(nk)
if distribution == 't':
if min_n < 10:
test_used = 'Brunner-Munzel with t distribution (not recommended)'
else:
test_used = 'Brunner-Munzel with t distribution'
# the degrees of freedom
deg_fr = varN**2/(N**2*sum(vark**2/((nk - 1)*nk**2)))
# the two-tailed p-value
p_val = 2*(1 - t.cdf(abs(W), deg_fr))
elif distribution == 'z':
if min_n < 50:
test_used = 'Brunner-Munzel with standard normal distribution (not recommended)'
else:
test_used = 'Brunner-Munzel with standard normal distribution'
# the two-tailed p-value
p_val = 2*(1 - norm.cdf(abs(W)))
deg_fr = 'n.a.'
results = {'var. est.':varN, 'min n':min_n, 'test statistic':W, 'df':deg_fr, 'p-value':p_val, 'categories':cat1 + ', ' + cat2, 'test used':test_used}
results_pd = pd.DataFrame([results])
return results_pd
def ts_brunner_munzel_perm(bin_field, ord_field, categories=None, levels=None, n_iter=1000):
'''
Brunner-Munzel Permutation Test
--------------------------------------------
This function performs a studentized permutation test of the Brunner-Munzel test. It is recommended to use this test above the regular version if sample sizes are too small in the data (e.g. less than 10 (or 15)).
Parameters
----------
bin_field : pandas series
data with categories for the rows
ord_field : pandas series
data with the scores (ordinal field)
categories : list or dictionary, optional
the two categories to use from bin_field. If not set the first two found will be used
levels : list or dictionary, optional
the scores in order
n_iter : integer, optional
number of iterations.
Returns
-------
A dataframe with:
* *iters*, the number of iterations
* *observed*, the observed Brunner-Munzel statistic
* *n above*, the number of permutations that had a value above but not equal equal to the observed statistic
* *n below*, the number of permutations that had a value below but not equal equal to the observed statistic
* *p-appr.*, the significance (p-value), two-tailed
Notes
-----
The idea for a studentized permutation test was proposed by Neubert and Brunner (2007). The regular Brunner-Munzel test statistic is calculated, then the categories get shuffled and the statistic is re-calculated. This repeats.
The minimum of the number of permutations above vs. below the observed statistic is multiplied by two and then divided by the number of iterations, to obtain the p-value (Schüürhuis et al., 2025, p. 7)
See the ts_brunner_munzel for more details.
Before, After and Alternatives
------------------------------
Before running the test you might first want to get an impression using a cross table and/or some visualisation:
* [tab_cross](../other/table_cross.html#tab_cross)
* [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)
After the test you might want an effect size measure:
* [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
* [es_hodges_lehmann_is](../effect_sizes/eff_size_hodges_lehmann_is.html#es_hodges_lehmann_is) for Hodges-Lehmann
* [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be:
* [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
* [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test
References
----------
Neubert, K., & Brunner, E. (2007). A studentized permutation test for the non-parametric Behrens–Fisher problem. *Computational Statistics & Data Analysis, 51*(10), 5192–5204. https://doi.org/10.1016/j.csda.2006.05.024
Schüürhuis, S., Konietschke, F., & Brunner, E. (2025). A new approach to the Nonparametric Behrens-Fisher problem with compatible confidence intervals (arXiv:2504.01796). arXiv. https://doi.org/10.48550/arXiv.2504.01796
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
# DATA PREPARATION
#convert to pandas series if needed
if type(bin_field) is list:
bin_field = pd.Series(bin_field)
if type(ord_field) is list:
ord_field = pd.Series(ord_field)
#combine as one dataframe
df = pd.concat([bin_field, ord_field], axis=1)
df = df.dropna()
df.columns=['cat', 'score']
#replace the ordinal values if levels is provided
if levels is not None:
df['score'] = df['score'].map(levels).astype('Int8')
else:
df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] )
#the two categories
if categories is not None:
cat1 = categories[0]
cat2 = categories[1]
else:
cat1 = df.iloc[:,0].value_counts().index[0]
cat2 = df.iloc[:,0].value_counts().index[1]
# remove scores not in either of the two categories
df = df[df['cat'].isin([cat1, cat2])]
# get the mid-ranks of the pooled scores (R_ik)
df['rank'] = df['score'].rank()
# the counts
nk = df['cat'].value_counts()
# the overall sample size
N = sum(nk)
Ws = []
for i in range(n_iter+1):
# the mid-ranks within each category (R_ik^(k))
df['rank_within'] = df.groupby('cat')['score'].rank()
# the placement scores (R_ik^*)
df['placement'] = df['rank'] - df['rank_within']
# the unbiased variances for each category (s_Rk*^2)
unbVarRast = df.groupby('cat')['placement'].var(ddof=1)
# estimated variance for each category (sigma_k^2)
vark = unbVarRast / (N - nk)**2
# the overall estimated variance
varN = N*sum(vark/nk)
# the mean mid-rank for each category (based on pooled ranks)
barRk = df.groupby('cat')['rank'].mean()
# the test statistic
W = (barRk.iloc[1] - barRk.iloc[0])/(N*varN)**0.5
Ws.append(W)
# shuffle data
df['cat'] = df['cat'].sample(frac=1).values
n_above = sum(i > Ws[0] for i in Ws[1:n_iter+1])
n_below = sum(i < Ws[0] for i in Ws[1:n_iter+1])
p_appr = 2*min(n_above, n_below)/(n_iter)
results = {'iters':n_iter, 'observed':Ws[0], 'n above':n_above, 'n below':n_below, 'p-appr.':p_appr}
results_pd = pd.DataFrame([results])
return results_pd
Functions
def ts_brunner_munzel(bin_field, ord_field, categories=None, levels=None, distribution='t')-
Brunner-Munzel Test
The Brunner-Munzel test.
The Brunner-Munzel test, tests so-called stochastic dominance. It tests if you pick a random score from the first category, will there be a higher (or lower) chance than 50% it will be higher than a random score from the other category. Note this is not the same as a median test. A very short example.
If we have the scores 40, 50 and 60 in group A, and the scores 30, 50, and 51 in group B, the median of each group is 50. However if we pick 40 from group A, there is a 1/3 chance it is higher than a value from group B. If we pick 50 there is also a 1/3 chance, and if we pick 60 there is a 3/3 chance. Together, there is a (1/3+1/3+3/3)/3 = 5/9 chance that if you pick a random value from group A and B, that the one in group A is higher. This is quite off from the 50%.
The test could therefor be used if we have a binary and an ordinal variable. The Mann-Whitney U test is probably the most famous one used in these scenarios, but actually has quite some limitations, and this Brunner-Munzel test might be better suited in many cases. The Fligner-Policello test is another test that is sometimes used.
Brunner and Munzel (p. 21) indicate the test-statistic that is computed follows a standard normal distribution, if each category has 50 or more data points. They also remark (p. 22) that the test is no longer accurate if sample sizes are less than 10, although in Schüürhuis et al. (2025, p. 18) 15 is listed. Neubert and Brunner (2007) propose to use a studentized permutation test in these cases. Schüürhuis et al. (2025) developed an improved version of this test as well, called a C^2 test.
This function is shown in this YouTube video and the test is also described at PeterStatistics.com.
Parameters
bin_field:pandas series- data with categories for the rows
ord_field:pandas series- data with the scores (ordinal field)
categories:listordictionary, optional- the two categories to use from bin_field. If not set the first two found will be used
levels:listordictionary, optional- the scores in order
distribution:{"t", "z"}, optional- the use of the student t-distribution or the standard normal z-distribution.
Returns
A dataframe with:
- var. est., the estimated overall variance
- min n, the sample size of the category with the lowest count
- test-statistic, the Brunner-Munzel statistic
- df, the degrees of freedom
- p-value, the significance (p-value), two-tailed
- categories, the two categories that were used
- test used, description of the test used and distribution.
Notes
The test-statistic is calculated using (Brunner & Munzel, 2000, p. 21, eq. 4.8): W_n^{BM} = \frac{\bar{R}_2 - \bar{R}_1}{\sqrt{N \times \hat{\sigma}_N^2}}
with degrees of freedom (Brunner & Munzel, 2000, p. 22): df = \frac{\hat{\sigma}_N^4}{N^2 \times \sum_{k=1}^2 \frac{\hat{\sigma}_k^4}{\left(n_k - 1\right)\times n_k^2}}
The total estimated variance (Brunner & Munzel, 2000, p. 21, eq. 4.7): \hat{\sigma}_N^2 = N\times\left(\frac{\hat{\sigma}_1^2}{n_1} + \frac{\hat{\sigma}_2^2}{n_2}\right)
The estimated variance for each category (Brunner & Munzel, 2000, p. 20, eq. 4.6): \hat{\sigma}_k^2 = \frac{s_{R_k^*}^2}{\left(N - n_k\right)^2}
the sample variance of placement values (Brunner & Munzel, 2000, p. 20, eq. 4.5): s_{R_k^*}^2 = \frac{\sum_{i=1}^{n_k} \left(R_{ik}^* - \bar{R}_{k}^*\right)^2}{n_k - 1}
mean of the placement values: \bar{R}_{k}^* = \frac{\sum_{i=1}^{n_k} R_{ik}^*}{n_k}
the placement values: R_{ik}^* = R_{ik} - R_{ik}^{(k)}
mean of the pooled ranks for each category: \bar{R}_k = \frac{\sum_{i=1}^{n_k} R_{ik}}{n_k}
Symbols used:
- R_{ik}, the rank of the i-th score in category k, when using all combined scores
- R_{ik}^{(k)}, the rank of the i-th score in category k, when using only scores from category k
- N, the total sample size
- n_{k}, the number of scores in category k
Before, After and Alternatives
Before running the test you might first want to get an impression using a cross table and/or some visualisation:
After the test you might want an effect size measure:
- es_common_language_is for Common Language Effect Size
- me_hodges_lehmann_is for Hodges-Lehmann estimate
- r_rank_biserial_is for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be:
- ts_mann_whitney for the Mann-Whitney U test
- ts_fligner_policello for the Fligner-Policello test
- ts_brunner_munzel_perm for the Brunner-Munzel Permutation test
- ts_c_square for the C^2 test
References
Brunner, E., & Munzel, U. (2000). The nonparametric Behrens-Fisher problem: Asymptotic theory and a small-sample approximation. Biometrical Journal, 42(1), 17–25. https://doi.org/10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U
Neubert, K., & Brunner, E. (2007). A studentized permutation test for the non-parametric Behrens–Fisher problem. Computational Statistics & Data Analysis, 51(10), 5192–5204. https://doi.org/10.1016/j.csda.2006.05.024
Schüürhuis, S., Konietschke, F., & Brunner, E. (2025). A new approach to the Nonparametric Behrens-Fisher problem with compatible confidence intervals (arXiv:2504.01796). arXiv. https://doi.org/10.48550/arXiv.2504.01796
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> pd.set_option('display.width',1000) >>> file1 = "https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv" >>> df1 = pd.read_csv(file1, sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> coding = {'Far too little': 1, 'too little': 2, 'Enough': 3, 'Too much': 4, 'Far too much': 5} >>> ts_brunner_munzel(df1['Gen_Gender'], df1['Mix_NrAct'], levels=coding) var. est. min n test statistic df p-value categories test used 0 0.156508 11 4.465838 36.980916 0.000073 Male, Female Brunner-Munzel with t distributionExpand source code
def ts_brunner_munzel(bin_field, ord_field, categories=None, levels=None, distribution='t'): ''' Brunner-Munzel Test -------------------------------------------- The Brunner-Munzel test. The Brunner-Munzel test, tests so-called stochastic dominance. It tests if you pick a random score from the first category, will there be a higher (or lower) chance than 50% it will be higher than a random score from the other category. Note this is not the same as a median test. A very short example. If we have the scores 40, 50 and 60 in group A, and the scores 30, 50, and 51 in group B, the median of each group is 50. However if we pick 40 from group A, there is a 1/3 chance it is higher than a value from group B. If we pick 50 there is also a 1/3 chance, and if we pick 60 there is a 3/3 chance. Together, there is a (1/3+1/3+3/3)/3 = 5/9 chance that if you pick a random value from group A and B, that the one in group A is higher. This is quite off from the 50%. The test could therefor be used if we have a binary and an ordinal variable. The Mann-Whitney U test is probably the most famous one used in these scenarios, but actually has quite some limitations, and this Brunner-Munzel test might be better suited in many cases. The Fligner-Policello test is another test that is sometimes used. Brunner and Munzel (p. 21) indicate the test-statistic that is computed follows a standard normal distribution, if each category has 50 or more data points. They also remark (p. 22) that the test is no longer accurate if sample sizes are less than 10, although in Schüürhuis et al. (2025, p. 18) 15 is listed. Neubert and Brunner (2007) propose to use a studentized permutation test in these cases. Schüürhuis et al. (2025) developed an improved version of this test as well, called a \(C^2\) test. This function is shown in this [YouTube video](https://youtu.be/qwugQCbXtnM) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/BrunnerMunzelTest.html). Parameters ---------- bin_field : pandas series data with categories for the rows ord_field : pandas series data with the scores (ordinal field) categories : list or dictionary, optional the two categories to use from bin_field. If not set the first two found will be used levels : list or dictionary, optional the scores in order distribution : {"t", "z"}, optional the use of the student t-distribution or the standard normal z-distribution. Returns ------- A dataframe with: * *var. est.*, the estimated overall variance * *min n*, the sample size of the category with the lowest count * *test-statistic*, the Brunner-Munzel statistic * *df*, the degrees of freedom * *p-value*, the significance (p-value), two-tailed * *categories*, the two categories that were used * *test used*, description of the test used and distribution. Notes ----- The test-statistic is calculated using (Brunner & Munzel, 2000, p. 21, eq. 4.8): $$W_n^{BM} = \\frac{\\bar{R}_2 - \\bar{R}_1}{\\sqrt{N \\times \\hat{\\sigma}_N^2}}$$ with degrees of freedom (Brunner & Munzel, 2000, p. 22): $$df = \\frac{\\hat{\\sigma}_N^4}{N^2 \\times \\sum_{k=1}^2 \\frac{\\hat{\\sigma}_k^4}{\\left(n_k - 1\\right)\\times n_k^2}}$$ The total estimated variance (Brunner & Munzel, 2000, p. 21, eq. 4.7): $$\\hat{\\sigma}_N^2 = N\\times\\left(\\frac{\\hat{\\sigma}_1^2}{n_1} + \\frac{\\hat{\\sigma}_2^2}{n_2}\\right)$$ The estimated variance for each category (Brunner & Munzel, 2000, p. 20, eq. 4.6): $$\\hat{\\sigma}_k^2 = \\frac{s_{R_k^*}^2}{\\left(N - n_k\\right)^2}$$ the sample variance of placement values (Brunner & Munzel, 2000, p. 20, eq. 4.5): $$s_{R_k^*}^2 = \\frac{\\sum_{i=1}^{n_k} \\left(R_{ik}^* - \\bar{R}_{k}^*\\right)^2}{n_k - 1}$$ mean of the placement values: $$\\bar{R}_{k}^* = \\frac{\\sum_{i=1}^{n_k} R_{ik}^*}{n_k}$$ the placement values: $$R_{ik}^* = R_{ik} - R_{ik}^{(k)}$$ mean of the pooled ranks for each category: $$\\bar{R}_k = \\frac{\\sum_{i=1}^{n_k} R_{ik}}{n_k}$$ *Symbols used:* * \\(R_{ik}\\), the rank of the i-th score in category k, when using all combined scores * \\(R_{ik}^{(k)}\\), the rank of the i-th score in category k, when using only scores from category k * \\(N\\), the total sample size * \\(n_{k}\\), the number of scores in category k Before, After and Alternatives ------------------------------ Before running the test you might first want to get an impression using a cross table and/or some visualisation: * [tab_cross](../other/table_cross.html#tab_cross) * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple) After the test you might want an effect size measure: * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta) Alternatives for this test could be: * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test References ---------- Brunner, E., & Munzel, U. (2000). The nonparametric Behrens-Fisher problem: Asymptotic theory and a small-sample approximation. *Biometrical Journal, 42*(1), 17–25. https://doi.org/10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U Neubert, K., & Brunner, E. (2007). A studentized permutation test for the non-parametric Behrens–Fisher problem. *Computational Statistics & Data Analysis, 51*(10), 5192–5204. https://doi.org/10.1016/j.csda.2006.05.024 Schüürhuis, S., Konietschke, F., & Brunner, E. (2025). A new approach to the Nonparametric Behrens-Fisher problem with compatible confidence intervals (arXiv:2504.01796). arXiv. https://doi.org/10.48550/arXiv.2504.01796 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> pd.set_option('display.width',1000) >>> file1 = "https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv" >>> df1 = pd.read_csv(file1, sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> coding = {'Far too little': 1, 'too little': 2, 'Enough': 3, 'Too much': 4, 'Far too much': 5} >>> ts_brunner_munzel(df1['Gen_Gender'], df1['Mix_NrAct'], levels=coding) var. est. min n test statistic df p-value categories test used 0 0.156508 11 4.465838 36.980916 0.000073 Male, Female Brunner-Munzel with t distribution ''' # DATA PREPARATION #convert to pandas series if needed if type(bin_field) is list: bin_field = pd.Series(bin_field) if type(ord_field) is list: ord_field = pd.Series(ord_field) #combine as one dataframe df = pd.concat([bin_field, ord_field], axis=1) df = df.dropna() df.columns=['cat', 'score'] #replace the ordinal values if levels is provided if levels is not None: df['score'] = df['score'].map(levels).astype('Int8') else: df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] ) #the two categories if categories is not None: cat1 = categories[0] cat2 = categories[1] else: cat1 = df.iloc[:,0].value_counts().index[0] cat2 = df.iloc[:,0].value_counts().index[1] # remove scores not in either of the two categories df = df[df['cat'].isin([cat1, cat2])] # RANKING # get the mid-ranks of the pooled scores (R_ik) df['rank'] = df['score'].rank() # the mid-ranks within each category (R_ik^(k)) df['rank_within'] = df.groupby('cat')['score'].rank() # the placement scores (R_ik^*) df['placement'] = df['rank'] - df['rank_within'] # TEST # the unbiased variances for each category (s_Rk*^2) unbVarRast = df.groupby('cat')['placement'].var(ddof=1) # the counts nk = df['cat'].value_counts() # the overall sample size N = sum(nk) # estimated variance for each category (sigma_k^2) vark = unbVarRast / (N - nk)**2 # the overall estimated variance varN = N*sum(vark/nk) # the mean mid-rank for each category (based on pooled ranks) barRk = df.groupby('cat')['rank'].mean() # the test statistic W = (barRk.iloc[1] - barRk.iloc[0])/(N*varN)**0.5 # minimum n min_n = min(nk) if distribution == 't': if min_n < 10: test_used = 'Brunner-Munzel with t distribution (not recommended)' else: test_used = 'Brunner-Munzel with t distribution' # the degrees of freedom deg_fr = varN**2/(N**2*sum(vark**2/((nk - 1)*nk**2))) # the two-tailed p-value p_val = 2*(1 - t.cdf(abs(W), deg_fr)) elif distribution == 'z': if min_n < 50: test_used = 'Brunner-Munzel with standard normal distribution (not recommended)' else: test_used = 'Brunner-Munzel with standard normal distribution' # the two-tailed p-value p_val = 2*(1 - norm.cdf(abs(W))) deg_fr = 'n.a.' results = {'var. est.':varN, 'min n':min_n, 'test statistic':W, 'df':deg_fr, 'p-value':p_val, 'categories':cat1 + ', ' + cat2, 'test used':test_used} results_pd = pd.DataFrame([results]) return results_pd def ts_brunner_munzel_perm(bin_field, ord_field, categories=None, levels=None, n_iter=1000)-
Brunner-Munzel Permutation Test
This function performs a studentized permutation test of the Brunner-Munzel test. It is recommended to use this test above the regular version if sample sizes are too small in the data (e.g. less than 10 (or 15)).
Parameters
bin_field:pandas series- data with categories for the rows
ord_field:pandas series- data with the scores (ordinal field)
categories:listordictionary, optional- the two categories to use from bin_field. If not set the first two found will be used
levels:listordictionary, optional- the scores in order
n_iter:integer, optional- number of iterations.
Returns
A dataframe with:
- iters, the number of iterations
- observed, the observed Brunner-Munzel statistic
- n above, the number of permutations that had a value above but not equal equal to the observed statistic
- n below, the number of permutations that had a value below but not equal equal to the observed statistic
- p-appr., the significance (p-value), two-tailed
Notes
The idea for a studentized permutation test was proposed by Neubert and Brunner (2007). The regular Brunner-Munzel test statistic is calculated, then the categories get shuffled and the statistic is re-calculated. This repeats.
The minimum of the number of permutations above vs. below the observed statistic is multiplied by two and then divided by the number of iterations, to obtain the p-value (Schüürhuis et al., 2025, p. 7)
See the ts_brunner_munzel for more details.
Before, After and Alternatives
Before running the test you might first want to get an impression using a cross table and/or some visualisation: * tab_cross * vi_bar_stacked_multiple
After the test you might want an effect size measure: * es_common_language_is for Common Language Effect Size * es_hodges_lehmann_is for Hodges-Lehmann * r_rank_biserial_is for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be: * ts_mann_whitney for the Mann-Whitney U test * ts_fligner_policello for the Fligner-Policello test
References
Neubert, K., & Brunner, E. (2007). A studentized permutation test for the non-parametric Behrens–Fisher problem. Computational Statistics & Data Analysis, 51(10), 5192–5204. https://doi.org/10.1016/j.csda.2006.05.024
Schüürhuis, S., Konietschke, F., & Brunner, E. (2025). A new approach to the Nonparametric Behrens-Fisher problem with compatible confidence intervals (arXiv:2504.01796). arXiv. https://doi.org/10.48550/arXiv.2504.01796
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_brunner_munzel_perm(bin_field, ord_field, categories=None, levels=None, n_iter=1000): ''' Brunner-Munzel Permutation Test -------------------------------------------- This function performs a studentized permutation test of the Brunner-Munzel test. It is recommended to use this test above the regular version if sample sizes are too small in the data (e.g. less than 10 (or 15)). Parameters ---------- bin_field : pandas series data with categories for the rows ord_field : pandas series data with the scores (ordinal field) categories : list or dictionary, optional the two categories to use from bin_field. If not set the first two found will be used levels : list or dictionary, optional the scores in order n_iter : integer, optional number of iterations. Returns ------- A dataframe with: * *iters*, the number of iterations * *observed*, the observed Brunner-Munzel statistic * *n above*, the number of permutations that had a value above but not equal equal to the observed statistic * *n below*, the number of permutations that had a value below but not equal equal to the observed statistic * *p-appr.*, the significance (p-value), two-tailed Notes ----- The idea for a studentized permutation test was proposed by Neubert and Brunner (2007). The regular Brunner-Munzel test statistic is calculated, then the categories get shuffled and the statistic is re-calculated. This repeats. The minimum of the number of permutations above vs. below the observed statistic is multiplied by two and then divided by the number of iterations, to obtain the p-value (Schüürhuis et al., 2025, p. 7) See the ts_brunner_munzel for more details. Before, After and Alternatives ------------------------------ Before running the test you might first want to get an impression using a cross table and/or some visualisation: * [tab_cross](../other/table_cross.html#tab_cross) * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple) After the test you might want an effect size measure: * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size * [es_hodges_lehmann_is](../effect_sizes/eff_size_hodges_lehmann_is.html#es_hodges_lehmann_is) for Hodges-Lehmann * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta) Alternatives for this test could be: * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test * [ts_fligner_policello](../tests/test_fligner_policello.html#ts_fligner_policello) for the Fligner-Policello test References ---------- Neubert, K., & Brunner, E. (2007). A studentized permutation test for the non-parametric Behrens–Fisher problem. *Computational Statistics & Data Analysis, 51*(10), 5192–5204. https://doi.org/10.1016/j.csda.2006.05.024 Schüürhuis, S., Konietschke, F., & Brunner, E. (2025). A new approach to the Nonparametric Behrens-Fisher problem with compatible confidence intervals (arXiv:2504.01796). arXiv. https://doi.org/10.48550/arXiv.2504.01796 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' # DATA PREPARATION #convert to pandas series if needed if type(bin_field) is list: bin_field = pd.Series(bin_field) if type(ord_field) is list: ord_field = pd.Series(ord_field) #combine as one dataframe df = pd.concat([bin_field, ord_field], axis=1) df = df.dropna() df.columns=['cat', 'score'] #replace the ordinal values if levels is provided if levels is not None: df['score'] = df['score'].map(levels).astype('Int8') else: df.iloc[:,1] = pd.to_numeric(df.iloc[:,1] ) #the two categories if categories is not None: cat1 = categories[0] cat2 = categories[1] else: cat1 = df.iloc[:,0].value_counts().index[0] cat2 = df.iloc[:,0].value_counts().index[1] # remove scores not in either of the two categories df = df[df['cat'].isin([cat1, cat2])] # get the mid-ranks of the pooled scores (R_ik) df['rank'] = df['score'].rank() # the counts nk = df['cat'].value_counts() # the overall sample size N = sum(nk) Ws = [] for i in range(n_iter+1): # the mid-ranks within each category (R_ik^(k)) df['rank_within'] = df.groupby('cat')['score'].rank() # the placement scores (R_ik^*) df['placement'] = df['rank'] - df['rank_within'] # the unbiased variances for each category (s_Rk*^2) unbVarRast = df.groupby('cat')['placement'].var(ddof=1) # estimated variance for each category (sigma_k^2) vark = unbVarRast / (N - nk)**2 # the overall estimated variance varN = N*sum(vark/nk) # the mean mid-rank for each category (based on pooled ranks) barRk = df.groupby('cat')['rank'].mean() # the test statistic W = (barRk.iloc[1] - barRk.iloc[0])/(N*varN)**0.5 Ws.append(W) # shuffle data df['cat'] = df['cat'].sample(frac=1).values n_above = sum(i > Ws[0] for i in Ws[1:n_iter+1]) n_below = sum(i < Ws[0] for i in Ws[1:n_iter+1]) p_appr = 2*min(n_above, n_below)/(n_iter) results = {'iters':n_iter, 'observed':Ws[0], 'n above':n_above, 'n below':n_below, 'p-appr.':p_appr} results_pd = pd.DataFrame([results]) return results_pd