Module stikpetP.distributions.dist_mann_whitney_wilcoxon

Expand source code
import math
def di_mwwpmf(u, n1, n2):
    '''
    Mann-Whitney-Wilcoxon Probability Mass Function
    ------------------------------------------------------
    This function returns the probability for the specified U statistic, given n1 and n2 cases in each category.

    It first uses the di_mwwf function to determine the count for the u value, and divides it by the total number of possible arrangements.
    
    Parameters
    ----------
    u : int
        the U test statistic
    n1 : int
        the sample size of the first category
    n2 : int
        the sample size of the second category
    
    Returns
    -------
    p : the cumulative probability
    
    Notes
    -----
    To convert a W statistic to a U statistic use:
    $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$

    See the details in di_mwwf() on how the frequency is determined. This is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:
    
    $$C(n, n_1) = nCr(n, n_1) = \\binom{n}{n_1} = \\frac{n!}{n_1!\\times\\left(n - n_1\\right)!}$$

    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    return di_mwwf(u, n1, n2)/math.comb(n1 + n2, n1)
    
def di_mwwcdf(u, n1, n2):
    '''
    Mann-Whitney-Wilcoxon Cumulative Distribution Function
    ------------------------------------------------------
    This function returns the cumulative probability for the specified U statistic, given n1 and n2 cases in each category.

    It first uses the di_mwwd function to determine the distribution up to the specified u value, sums these results and divides it by the total number of possible arrangements.
    
    Parameters
    ----------
    u : int
        the U test statistic
    n1 : int
        the sample size of the first category
    n2 : int
        the sample size of the second category
    
    Returns
    -------
    p : the cumulative probability
    
    Notes
    -----
    See the details in di_mwwd() on how the frequency distribution is determined. The sum of these is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:
    
    $$C(n, n_1) = nCr(n, n_1) = \\binom{n}{n_1} = \\frac{n!}{n_1!\\times\\left(n - n_1\\right)!}$$

    To convert a W statistic to a U statistic use:
    $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    # determine the entire distribution up to U
    dist = di_mwwd(u, n1, n2)
    # divide the sum of those by the total number of possibilities
    p = sum(dist)/math.comb(n1 + n2, n2)
    return p


def di_mwwf(u, n1, n2, memo=None):
    '''
    Mann-Whitney-Wilcoxon Count 
    ---------------------------
    This function will return the number of possible ways to obtain a specified U value given n1 and n2 cases in each category.
    
    
    Parameters
    ----------
    u : int
        the U test statistic
    n1 : int
        the sample size of the first category
    n2 : int
        the sample size of the second category
    
    Returns
    -------
    result : a list with the counts starting with the count for U=0
    
    Notes
    -----
    A recursive formula is used:
    $$f_{n_1, n_2}(U) = \\begin{cases} 0 & \\text{ if } U < 0 \\text{ or } U > n_1\\times n_2 \\\\ 1 & \\text{ if } (n_1=1 \\text{ or } n_2=1) \\text{ and } 0 \\leq U \\leq n_1\\times n_2 \\\\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \\text{ else } \\end{cases} $$
    
    This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).

    To convert a W statistic to a U statistic use:
    $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$    

    References
    ----------
    Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. *Journal of the Royal Statistical Society. Series C (Applied Statistics), 22*(2), 269–273. doi:10.2307/2346934
    
    Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. *Psychometrika, 11*(2), 97–105. doi:10.1007/BF02288926
    
    Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. doi:10.1214/aoms/1177730491
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    if memo is None:
        memo = {}

    # Check if result is already computed
    if (u, n1, n2) in memo:
        return memo[(u, n1, n2)]

    # Base cases
    if u < 0 or u > n1 * n2:
        result = 0
    elif n1 == 1 or n2 == 1:
        result = 1
    else:
        # Recursive case
        result = di_mwwf(u, n1, n2-1, memo) + di_mwwf(u - n2, n1-1, n2, memo)
    
    # Memoize the result
    memo[(u, n1, n2)] = result

    return result

def di_mwwd(u, n1, n2):
    '''
    Mann-Whitney-Wilcoxon Distribution 
    -----------------------------------
    This distribution is also referred to as a permutation distribution.
    
    It is used in the Mann-Whitney U and Wilcoxon Rank Sum test.
    
    In this version the U-statistic is used as input, and the sample sizes of each of the two categories. This function will return the counts (frequency) of each possible U value from 0 till the provided u value. If all possible values need to be shown, simply set u = i*j.
    
    Parameters
    ----------
    u : int
        the U test statistic
    n1 : int
        the sample size of the first category
    n2 : int
        the sample size of the second category
    
    Returns
    -------
    result : a list with the counts starting with the count for U=0
    
    Notes
    -----
    A recursive formula for this is:
    $$f_{n_1, n_2}(U) = \\begin{cases} 0 & \\text{ if } U < 0 \\text{ or } U > n_1\\times n_2 \\\\ 1 & \\text{ if } (n_1=1 \\text{ or } n_2=1) \\text{ and } 0 \\leq U \\leq n_1\\times n_2 \\\\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \\text{ else } \\end{cases} $$
    
    This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).

    To convert a W statistic to a U statistic use:
    $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$

    References
    ----------
    Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. *Journal of the Royal Statistical Society. Series C (Applied Statistics), 22*(2), 269–273. doi:10.2307/2346934
    
    Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. *Psychometrika, 11*(2), 97–105. doi:10.1007/BF02288926
    
    Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. doi:10.1214/aoms/1177730491
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    # Create a 3D DP table with dimensions (u+1) x (i+1) x (j+1)
    dp = [[[0] * (n2+1) for _ in range(n1+1)] for _ in range(u+1)]
    
    # Initialize base cases
    for x in range(u+1):
        for y in range(n1+1):
            for z in range(n2+1):
                if x < 0 or x > y * z:
                    dp[x][y][z] = 0
                elif y == 1 or z == 1:
                    dp[x][y][z] = 1
                else:
                    # Fill the DP table based on the recurrence relation
                    if x <= z:
                        dp[x][y][z] = dp[x][y][z-1] + (dp[x - z][y-1][z] if x - z >= 0 else 0)
                    else:
                        dp[x][y][z] = dp[x][y][z-1] + (dp[x - z][y-1][z] if x - z >= 0 else 0)

    # Return the distribution of results from 0 to u
    return [dp[x][n1][n2] for x in range(u+1)]

Functions

def di_mwwcdf(u, n1, n2)

Mann-Whitney-Wilcoxon Cumulative Distribution Function

This function returns the cumulative probability for the specified U statistic, given n1 and n2 cases in each category.

It first uses the di_mwwd function to determine the distribution up to the specified u value, sums these results and divides it by the total number of possible arrangements.

Parameters

u : int
the U test statistic
n1 : int
the sample size of the first category
n2 : int
the sample size of the second category

Returns

p : the cumulative probability
 

Notes

See the details in di_mwwd() on how the frequency distribution is determined. The sum of these is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:

C(n, n_1) = nCr(n, n_1) = \binom{n}{n_1} = \frac{n!}{n_1!\times\left(n - n_1\right)!}

To convert a W statistic to a U statistic use: U = W - \frac{n_1\times\left(n_1 + 1\right)}{2}

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def di_mwwcdf(u, n1, n2):
    '''
    Mann-Whitney-Wilcoxon Cumulative Distribution Function
    ------------------------------------------------------
    This function returns the cumulative probability for the specified U statistic, given n1 and n2 cases in each category.

    It first uses the di_mwwd function to determine the distribution up to the specified u value, sums these results and divides it by the total number of possible arrangements.
    
    Parameters
    ----------
    u : int
        the U test statistic
    n1 : int
        the sample size of the first category
    n2 : int
        the sample size of the second category
    
    Returns
    -------
    p : the cumulative probability
    
    Notes
    -----
    See the details in di_mwwd() on how the frequency distribution is determined. The sum of these is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:
    
    $$C(n, n_1) = nCr(n, n_1) = \\binom{n}{n_1} = \\frac{n!}{n_1!\\times\\left(n - n_1\\right)!}$$

    To convert a W statistic to a U statistic use:
    $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    # determine the entire distribution up to U
    dist = di_mwwd(u, n1, n2)
    # divide the sum of those by the total number of possibilities
    p = sum(dist)/math.comb(n1 + n2, n2)
    return p
def di_mwwd(u, n1, n2)

Mann-Whitney-Wilcoxon Distribution

This distribution is also referred to as a permutation distribution.

It is used in the Mann-Whitney U and Wilcoxon Rank Sum test.

In this version the U-statistic is used as input, and the sample sizes of each of the two categories. This function will return the counts (frequency) of each possible U value from 0 till the provided u value. If all possible values need to be shown, simply set u = i*j.

Parameters

u : int
the U test statistic
n1 : int
the sample size of the first category
n2 : int
the sample size of the second category

Returns

result : a list with the counts starting with the count for U=0
 

Notes

A recursive formula for this is: f_{n_1, n_2}(U) = \begin{cases} 0 & \text{ if } U < 0 \text{ or } U > n_1\times n_2 \\ 1 & \text{ if } (n_1=1 \text{ or } n_2=1) \text{ and } 0 \leq U \leq n_1\times n_2 \\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \text{ else } \end{cases}

This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).

To convert a W statistic to a U statistic use: U = W - \frac{n_1\times\left(n_1 + 1\right)}{2}

References

Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. Journal of the Royal Statistical Society. Series C (Applied Statistics), 22(2), 269–273. doi:10.2307/2346934

Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. Psychometrika, 11(2), 97–105. doi:10.1007/BF02288926

Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1), 50–60. doi:10.1214/aoms/1177730491

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def di_mwwd(u, n1, n2):
    '''
    Mann-Whitney-Wilcoxon Distribution 
    -----------------------------------
    This distribution is also referred to as a permutation distribution.
    
    It is used in the Mann-Whitney U and Wilcoxon Rank Sum test.
    
    In this version the U-statistic is used as input, and the sample sizes of each of the two categories. This function will return the counts (frequency) of each possible U value from 0 till the provided u value. If all possible values need to be shown, simply set u = i*j.
    
    Parameters
    ----------
    u : int
        the U test statistic
    n1 : int
        the sample size of the first category
    n2 : int
        the sample size of the second category
    
    Returns
    -------
    result : a list with the counts starting with the count for U=0
    
    Notes
    -----
    A recursive formula for this is:
    $$f_{n_1, n_2}(U) = \\begin{cases} 0 & \\text{ if } U < 0 \\text{ or } U > n_1\\times n_2 \\\\ 1 & \\text{ if } (n_1=1 \\text{ or } n_2=1) \\text{ and } 0 \\leq U \\leq n_1\\times n_2 \\\\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \\text{ else } \\end{cases} $$
    
    This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).

    To convert a W statistic to a U statistic use:
    $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$

    References
    ----------
    Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. *Journal of the Royal Statistical Society. Series C (Applied Statistics), 22*(2), 269–273. doi:10.2307/2346934
    
    Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. *Psychometrika, 11*(2), 97–105. doi:10.1007/BF02288926
    
    Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. doi:10.1214/aoms/1177730491
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    # Create a 3D DP table with dimensions (u+1) x (i+1) x (j+1)
    dp = [[[0] * (n2+1) for _ in range(n1+1)] for _ in range(u+1)]
    
    # Initialize base cases
    for x in range(u+1):
        for y in range(n1+1):
            for z in range(n2+1):
                if x < 0 or x > y * z:
                    dp[x][y][z] = 0
                elif y == 1 or z == 1:
                    dp[x][y][z] = 1
                else:
                    # Fill the DP table based on the recurrence relation
                    if x <= z:
                        dp[x][y][z] = dp[x][y][z-1] + (dp[x - z][y-1][z] if x - z >= 0 else 0)
                    else:
                        dp[x][y][z] = dp[x][y][z-1] + (dp[x - z][y-1][z] if x - z >= 0 else 0)

    # Return the distribution of results from 0 to u
    return [dp[x][n1][n2] for x in range(u+1)]
def di_mwwf(u, n1, n2, memo=None)

Mann-Whitney-Wilcoxon Count

This function will return the number of possible ways to obtain a specified U value given n1 and n2 cases in each category.

Parameters

u : int
the U test statistic
n1 : int
the sample size of the first category
n2 : int
the sample size of the second category

Returns

result : a list with the counts starting with the count for U=0
 

Notes

A recursive formula is used: f_{n_1, n_2}(U) = \begin{cases} 0 & \text{ if } U < 0 \text{ or } U > n_1\times n_2 \\ 1 & \text{ if } (n_1=1 \text{ or } n_2=1) \text{ and } 0 \leq U \leq n_1\times n_2 \\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \text{ else } \end{cases}

This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).

To convert a W statistic to a U statistic use: U = W - \frac{n_1\times\left(n_1 + 1\right)}{2}

References

Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. Journal of the Royal Statistical Society. Series C (Applied Statistics), 22(2), 269–273. doi:10.2307/2346934

Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. Psychometrika, 11(2), 97–105. doi:10.1007/BF02288926

Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 18(1), 50–60. doi:10.1214/aoms/1177730491

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def di_mwwf(u, n1, n2, memo=None):
    '''
    Mann-Whitney-Wilcoxon Count 
    ---------------------------
    This function will return the number of possible ways to obtain a specified U value given n1 and n2 cases in each category.
    
    
    Parameters
    ----------
    u : int
        the U test statistic
    n1 : int
        the sample size of the first category
    n2 : int
        the sample size of the second category
    
    Returns
    -------
    result : a list with the counts starting with the count for U=0
    
    Notes
    -----
    A recursive formula is used:
    $$f_{n_1, n_2}(U) = \\begin{cases} 0 & \\text{ if } U < 0 \\text{ or } U > n_1\\times n_2 \\\\ 1 & \\text{ if } (n_1=1 \\text{ or } n_2=1) \\text{ and } 0 \\leq U \\leq n_1\\times n_2 \\\\ f_{n_1, n_2-1}(U) + f_{n_1-1, n_2}(U - n_2) & \\text{ else } \\end{cases} $$
    
    This formula is found in Mann and Whitney (1947, p. 51), Dinneen and Blakesley 1973, p. 269) and described also in Festinger (1946).

    To convert a W statistic to a U statistic use:
    $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$    

    References
    ----------
    Dinneen, L. C., & Blakesley, B. C. (1973). Algorithm AS 62: A generator for the sampling distribution of the Mann- Whitney U statistic. *Journal of the Royal Statistical Society. Series C (Applied Statistics), 22*(2), 269–273. doi:10.2307/2346934
    
    Festinger, L. (1946). The significance of difference between means without reference to the frequency distribution function. *Psychometrika, 11*(2), 97–105. doi:10.1007/BF02288926
    
    Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. *The Annals of Mathematical Statistics, 18*(1), 50–60. doi:10.1214/aoms/1177730491
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    if memo is None:
        memo = {}

    # Check if result is already computed
    if (u, n1, n2) in memo:
        return memo[(u, n1, n2)]

    # Base cases
    if u < 0 or u > n1 * n2:
        result = 0
    elif n1 == 1 or n2 == 1:
        result = 1
    else:
        # Recursive case
        result = di_mwwf(u, n1, n2-1, memo) + di_mwwf(u - n2, n1-1, n2, memo)
    
    # Memoize the result
    memo[(u, n1, n2)] = result

    return result
def di_mwwpmf(u, n1, n2)

Mann-Whitney-Wilcoxon Probability Mass Function

This function returns the probability for the specified U statistic, given n1 and n2 cases in each category.

It first uses the di_mwwf function to determine the count for the u value, and divides it by the total number of possible arrangements.

Parameters

u : int
the U test statistic
n1 : int
the sample size of the first category
n2 : int
the sample size of the second category

Returns

p : the cumulative probability
 

Notes

To convert a W statistic to a U statistic use: U = W - \frac{n_1\times\left(n_1 + 1\right)}{2}

See the details in di_mwwf() on how the frequency is determined. This is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:

C(n, n_1) = nCr(n, n_1) = \binom{n}{n_1} = \frac{n!}{n_1!\times\left(n - n_1\right)!}

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def di_mwwpmf(u, n1, n2):
    '''
    Mann-Whitney-Wilcoxon Probability Mass Function
    ------------------------------------------------------
    This function returns the probability for the specified U statistic, given n1 and n2 cases in each category.

    It first uses the di_mwwf function to determine the count for the u value, and divides it by the total number of possible arrangements.
    
    Parameters
    ----------
    u : int
        the U test statistic
    n1 : int
        the sample size of the first category
    n2 : int
        the sample size of the second category
    
    Returns
    -------
    p : the cumulative probability
    
    Notes
    -----
    To convert a W statistic to a U statistic use:
    $$U = W - \\frac{n_1\\times\\left(n_1 + 1\\right)}{2}$$

    See the details in di_mwwf() on how the frequency is determined. This is then divided by the total number of possibilities, which is the number of ways we can choose $n_1$ items out of $n$, without replacement. This is the binomial coefficient, or number of combinations:
    
    $$C(n, n_1) = nCr(n, n_1) = \\binom{n}{n_1} = \\frac{n!}{n_1!\\times\\left(n - n_1\\right)!}$$

    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    return di_mwwf(u, n1, n2)/math.comb(n1 + n2, n1)