Module stikpetP.distributions.dist_mann_whitney_wilcoxon
Expand source code
import math
def di_mwwpmf(u, n1, n2):
'''
Mann-Whitney-Wilcoxon Probability Mass Function
------------------------------------------------------
This function returns the probability for the specified U statistic, given n1 and n2 cases in each category.
It first uses the di_mwwf function to determine the count for the u value, and divides it by the total number of possible arrangements.
Parameters
----------
u : int
the U test statistic
n1 : int
the sample size of the first category
n2 : int
the sample size of the second category
Returns
-------
p : the cumulative probability
Notes
-----
To convert a W statistic to a U statistic use:
$$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$
See the details in di_mwwf() on how the frequency is determined. This is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:
$$C(n, n_1) = nCr(n, n_1) = \\binom{n}{n_1} = \\frac{n!}{n_1!\\times\\left(n - n_1\\right)!}$$
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
return di_mwwf(u, n1, n2)/math.comb(n1 + n2, n1)
def di_mwwcdf(u, n1, n2):
'''
Mann-Whitney-Wilcoxon Cumulative Distribution Function
------------------------------------------------------
This function returns the cumulative probability for the specified U statistic, given n1 and n2 cases in each category.
It first uses the di_mwwd function to determine the distribution up to the specified u value, sums these results and divides it by the total number of possible arrangements.
Parameters
----------
u : int
the U test statistic
n1 : int
the sample size of the first category
n2 : int
the sample size of the second category
Returns
-------
p : the cumulative probability
Notes
-----
See the details in di_mwwd() on how the frequency distribution is determined. The sum of these is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:
$$C(n, n_1) = nCr(n, n_1) = \\binom{n}{n_1} = \\frac{n!}{n_1!\\times\\left(n - n_1\\right)!}$$
To convert a W statistic to a U statistic use:
$$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
# determine the entire distribution up to U
dist = di_mwwd(u, n1, n2)
# divide the sum of those by the total number of possibilities
p = sum(dist)/math.comb(n1 + n2, n2)
return p
def di_mwwf(u, n1, n2, memo=None):
'''
Mann-Whitney-Wilcoxon Count
---------------------------
This function will return the number of possible ways to obtain a specified U value given n1 and n2 cases in each category.
Parameters
----------
u : int
the U test statistic
n1 : int
the sample size of the first category
n2 : int
the sample size of the second category
Returns
-------
result : a list with the counts starting with the count for U=0
Notes
-----
A recursive formula is used:
$$f_{n_1, n_2}(U) = \\begin{cases} 0 & \\text{ if } U < 0 \\text{ or } U > n_1\\times n_2 \\\\ 1 & \\text{ if } (n_1=1 \\text{ or } n_2=1) \\text{ and } 0 \\leq U \\leq n_1\\times n_2 \\\\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \\text{ else } \\end{cases} $$
This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).
To convert a W statistic to a U statistic use:
$$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$
References
----------
Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. *Journal of the Royal Statistical Society. Series C (Applied Statistics), 22*(2), 269–273. doi:10.2307/2346934
Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. *Psychometrika, 11*(2), 97–105. doi:10.1007/BF02288926
Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. doi:10.1214/aoms/1177730491
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
if memo is None:
memo = {}
# Check if result is already computed
if (u, n1, n2) in memo:
return memo[(u, n1, n2)]
# Base cases
if u < 0 or u > n1 * n2:
result = 0
elif n1 == 1 or n2 == 1:
result = 1
else:
# Recursive case
result = di_mwwf(u, n1, n2-1, memo) + di_mwwf(u - n2, n1-1, n2, memo)
# Memoize the result
memo[(u, n1, n2)] = result
return result
def di_mwwd(u, n1, n2):
'''
Mann-Whitney-Wilcoxon Distribution
-----------------------------------
This distribution is also referred to as a permutation distribution.
It is used in the Mann-Whitney U and Wilcoxon Rank Sum test.
In this version the U-statistic is used as input, and the sample sizes of each of the two categories. This function will return the counts (frequency) of each possible U value from 0 till the provided u value. If all possible values need to be shown, simply set u = i*j.
Parameters
----------
u : int
the U test statistic
n1 : int
the sample size of the first category
n2 : int
the sample size of the second category
Returns
-------
result : a list with the counts starting with the count for U=0
Notes
-----
A recursive formula for this is:
$$f_{n_1, n_2}(U) = \\begin{cases} 0 & \\text{ if } U < 0 \\text{ or } U > n_1\\times n_2 \\\\ 1 & \\text{ if } (n_1=1 \\text{ or } n_2=1) \\text{ and } 0 \\leq U \\leq n_1\\times n_2 \\\\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \\text{ else } \\end{cases} $$
This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).
To convert a W statistic to a U statistic use:
$$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$
References
----------
Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. *Journal of the Royal Statistical Society. Series C (Applied Statistics), 22*(2), 269–273. doi:10.2307/2346934
Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. *Psychometrika, 11*(2), 97–105. doi:10.1007/BF02288926
Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. doi:10.1214/aoms/1177730491
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
# Create a 3D DP table with dimensions (u+1) x (i+1) x (j+1)
dp = [[[0] * (n2+1) for _ in range(n1+1)] for _ in range(u+1)]
# Initialize base cases
for x in range(u+1):
for y in range(n1+1):
for z in range(n2+1):
if x < 0 or x > y * z:
dp[x][y][z] = 0
elif y == 1 or z == 1:
dp[x][y][z] = 1
else:
# Fill the DP table based on the recurrence relation
if x <= z:
dp[x][y][z] = dp[x][y][z-1] + (dp[x - z][y-1][z] if x - z >= 0 else 0)
else:
dp[x][y][z] = dp[x][y][z-1] + (dp[x - z][y-1][z] if x - z >= 0 else 0)
# Return the distribution of results from 0 to u
return [dp[x][n1][n2] for x in range(u+1)]
Functions
def di_mwwcdf(u, n1, n2)
-
Mann-Whitney-Wilcoxon Cumulative Distribution Function
This function returns the cumulative probability for the specified U statistic, given n1 and n2 cases in each category.
It first uses the di_mwwd function to determine the distribution up to the specified u value, sums these results and divides it by the total number of possible arrangements.
Parameters
u
:int
- the U test statistic
n1
:int
- the sample size of the first category
n2
:int
- the sample size of the second category
Returns
p
:the cumulative probability
Notes
See the details in di_mwwd() on how the frequency distribution is determined. The sum of these is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:
C(n, n_1) = nCr(n, n_1) = \binom{n}{n_1} = \frac{n!}{n_1!\times\left(n - n_1\right)!}
To convert a W statistic to a U statistic use: U = W - \frac{n_1\times\left(n_1 + 1\right)}{2}
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def di_mwwcdf(u, n1, n2): ''' Mann-Whitney-Wilcoxon Cumulative Distribution Function ------------------------------------------------------ This function returns the cumulative probability for the specified U statistic, given n1 and n2 cases in each category. It first uses the di_mwwd function to determine the distribution up to the specified u value, sums these results and divides it by the total number of possible arrangements. Parameters ---------- u : int the U test statistic n1 : int the sample size of the first category n2 : int the sample size of the second category Returns ------- p : the cumulative probability Notes ----- See the details in di_mwwd() on how the frequency distribution is determined. The sum of these is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations: $$C(n, n_1) = nCr(n, n_1) = \\binom{n}{n_1} = \\frac{n!}{n_1!\\times\\left(n - n_1\\right)!}$$ To convert a W statistic to a U statistic use: $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$ Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' # determine the entire distribution up to U dist = di_mwwd(u, n1, n2) # divide the sum of those by the total number of possibilities p = sum(dist)/math.comb(n1 + n2, n2) return p
def di_mwwd(u, n1, n2)
-
Mann-Whitney-Wilcoxon Distribution
This distribution is also referred to as a permutation distribution.
It is used in the Mann-Whitney U and Wilcoxon Rank Sum test.
In this version the U-statistic is used as input, and the sample sizes of each of the two categories. This function will return the counts (frequency) of each possible U value from 0 till the provided u value. If all possible values need to be shown, simply set u = i*j.
Parameters
u
:int
- the U test statistic
n1
:int
- the sample size of the first category
n2
:int
- the sample size of the second category
Returns
result
:a list with the counts starting with the count for U=0
Notes
A recursive formula for this is: f_{n_1, n_2}(U) = \begin{cases} 0 & \text{ if } U < 0 \text{ or } U > n_1\times n_2 \\ 1 & \text{ if } (n_1=1 \text{ or } n_2=1) \text{ and } 0 \leq U \leq n_1\times n_2 \\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \text{ else } \end{cases}
This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).
To convert a W statistic to a U statistic use: U = W - \frac{n_1\times\left(n_1 + 1\right)}{2}
References
Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. Journal of the Royal Statistical Society. Series C (Applied Statistics), 22(2), 269–273. doi:10.2307/2346934
Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. Psychometrika, 11(2), 97–105. doi:10.1007/BF02288926
Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1), 50–60. doi:10.1214/aoms/1177730491
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def di_mwwd(u, n1, n2): ''' Mann-Whitney-Wilcoxon Distribution ----------------------------------- This distribution is also referred to as a permutation distribution. It is used in the Mann-Whitney U and Wilcoxon Rank Sum test. In this version the U-statistic is used as input, and the sample sizes of each of the two categories. This function will return the counts (frequency) of each possible U value from 0 till the provided u value. If all possible values need to be shown, simply set u = i*j. Parameters ---------- u : int the U test statistic n1 : int the sample size of the first category n2 : int the sample size of the second category Returns ------- result : a list with the counts starting with the count for U=0 Notes ----- A recursive formula for this is: $$f_{n_1, n_2}(U) = \\begin{cases} 0 & \\text{ if } U < 0 \\text{ or } U > n_1\\times n_2 \\\\ 1 & \\text{ if } (n_1=1 \\text{ or } n_2=1) \\text{ and } 0 \\leq U \\leq n_1\\times n_2 \\\\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \\text{ else } \\end{cases} $$ This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946). To convert a W statistic to a U statistic use: $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$ References ---------- Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. *Journal of the Royal Statistical Society. Series C (Applied Statistics), 22*(2), 269–273. doi:10.2307/2346934 Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. *Psychometrika, 11*(2), 97–105. doi:10.1007/BF02288926 Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. doi:10.1214/aoms/1177730491 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' # Create a 3D DP table with dimensions (u+1) x (i+1) x (j+1) dp = [[[0] * (n2+1) for _ in range(n1+1)] for _ in range(u+1)] # Initialize base cases for x in range(u+1): for y in range(n1+1): for z in range(n2+1): if x < 0 or x > y * z: dp[x][y][z] = 0 elif y == 1 or z == 1: dp[x][y][z] = 1 else: # Fill the DP table based on the recurrence relation if x <= z: dp[x][y][z] = dp[x][y][z-1] + (dp[x - z][y-1][z] if x - z >= 0 else 0) else: dp[x][y][z] = dp[x][y][z-1] + (dp[x - z][y-1][z] if x - z >= 0 else 0) # Return the distribution of results from 0 to u return [dp[x][n1][n2] for x in range(u+1)]
def di_mwwf(u, n1, n2, memo=None)
-
Mann-Whitney-Wilcoxon Count
This function will return the number of possible ways to obtain a specified U value given n1 and n2 cases in each category.
Parameters
u
:int
- the U test statistic
n1
:int
- the sample size of the first category
n2
:int
- the sample size of the second category
Returns
result
:a list with the counts starting with the count for U=0
Notes
A recursive formula is used: f_{n_1, n_2}(U) = \begin{cases} 0 & \text{ if } U < 0 \text{ or } U > n_1\times n_2 \\ 1 & \text{ if } (n_1=1 \text{ or } n_2=1) \text{ and } 0 \leq U \leq n_1\times n_2 \\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \text{ else } \end{cases}
This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).
To convert a W statistic to a U statistic use: U = W - \frac{n_1\times\left(n_1 + 1\right)}{2}
References
Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. Journal of the Royal Statistical Society. Series C (Applied Statistics), 22(2), 269–273. doi:10.2307/2346934
Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. Psychometrika, 11(2), 97–105. doi:10.1007/BF02288926
Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1), 50–60. doi:10.1214/aoms/1177730491
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def di_mwwf(u, n1, n2, memo=None): ''' Mann-Whitney-Wilcoxon Count --------------------------- This function will return the number of possible ways to obtain a specified U value given n1 and n2 cases in each category. Parameters ---------- u : int the U test statistic n1 : int the sample size of the first category n2 : int the sample size of the second category Returns ------- result : a list with the counts starting with the count for U=0 Notes ----- A recursive formula is used: $$f_{n_1, n_2}(U) = \\begin{cases} 0 & \\text{ if } U < 0 \\text{ or } U > n_1\\times n_2 \\\\ 1 & \\text{ if } (n_1=1 \\text{ or } n_2=1) \\text{ and } 0 \\leq U \\leq n_1\\times n_2 \\\\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \\text{ else } \\end{cases} $$ This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946). To convert a W statistic to a U statistic use: $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$ References ---------- Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. *Journal of the Royal Statistical Society. Series C (Applied Statistics), 22*(2), 269–273. doi:10.2307/2346934 Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. *Psychometrika, 11*(2), 97–105. doi:10.1007/BF02288926 Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. doi:10.1214/aoms/1177730491 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' if memo is None: memo = {} # Check if result is already computed if (u, n1, n2) in memo: return memo[(u, n1, n2)] # Base cases if u < 0 or u > n1 * n2: result = 0 elif n1 == 1 or n2 == 1: result = 1 else: # Recursive case result = di_mwwf(u, n1, n2-1, memo) + di_mwwf(u - n2, n1-1, n2, memo) # Memoize the result memo[(u, n1, n2)] = result return result
def di_mwwpmf(u, n1, n2)
-
Mann-Whitney-Wilcoxon Probability Mass Function
This function returns the probability for the specified U statistic, given n1 and n2 cases in each category.
It first uses the di_mwwf function to determine the count for the u value, and divides it by the total number of possible arrangements.
Parameters
u
:int
- the U test statistic
n1
:int
- the sample size of the first category
n2
:int
- the sample size of the second category
Returns
p
:the cumulative probability
Notes
To convert a W statistic to a U statistic use: U = W - \frac{n_1\times\left(n_1 + 1\right)}{2}
See the details in di_mwwf() on how the frequency is determined. This is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:
C(n, n_1) = nCr(n, n_1) = \binom{n}{n_1} = \frac{n!}{n_1!\times\left(n - n_1\right)!}
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def di_mwwpmf(u, n1, n2): ''' Mann-Whitney-Wilcoxon Probability Mass Function ------------------------------------------------------ This function returns the probability for the specified U statistic, given n1 and n2 cases in each category. It first uses the di_mwwf function to determine the count for the u value, and divides it by the total number of possible arrangements. Parameters ---------- u : int the U test statistic n1 : int the sample size of the first category n2 : int the sample size of the second category Returns ------- p : the cumulative probability Notes ----- To convert a W statistic to a U statistic use: $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$ See the details in di_mwwf() on how the frequency is determined. This is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations: $$C(n, n_1) = nCr(n, n_1) = \\binom{n}{n_1} = \\frac{n!}{n_1!\\times\\left(n - n_1\\right)!}$$ Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' return di_mwwf(u, n1, n2)/math.comb(n1 + n2, n1)