Module `stikpetP.distributions.dist_wilcoxon`

Expand source code

from math import comb
from itertools import product

def di_wcdf(T, n, method='shift'):
    '''
    Wilcoxon Cumulative Distribution Function

    This function will give the cumulative probability of a sum of ranks of T, given a sample size of n.

    Some explanation on this distribution can be found in this [YouTube video](https://youtu.be/BKWnTJAp58E). This function is shown in this [YouTube video](https://youtu.be/6amg6Ug2awE) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/WilcoxonSignRank.html)
    
    Parameters
    ----------
    T : int
        the sum of ranks
    n : int
        the sample size
    method : {"shift", "enumerate", "recursive"}, optional
        the calculation method to use
    
    Returns
    -------
    float : the requested probability
    
    Notes
    -----
    The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability and cumulative probabilities.

    The recursive method uses the formula from McCornack (1965, p. 864):
    $$srf\\left(x,y\\right)=\\begin{cases} 0 & x \\lt 0 \\\\ 0 &amp; x\\gt\\binom{y+1}{2} \\\\ 1 & y=1 \\wedge \\left( x=0\\vee x=1 \\right) \\\\ srf^*\\left(x,y\\right) & y\\geq 0 \\end{cases}$$ 
    
    with:
    $$srf^*\\left(x,y\\right) = srf\\left(x-y,y-1\\right) + srf\\left(x,y-1\\right)$$   

    The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.
    
    * Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
    * Create a vector with the value 1 followed by n times a 0.
    * Create a shifted vector by moving all values by 1.
    * Add the two results (the original and the shifted version)
    * This will be the updated vector
    * Shift the vector now by 2
    * Add the two results (the updated vector with and the two shifted version)
    * Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.

    This function is used by [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
    
    See Also
    --------
    stikpetP.distributions.dist_wilcoxon.di_wpmf : for the probability mass function    
    
    References
    ----------
    McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. *Journal of the American Statistical Association, 60*(311), 864–871. doi:10.2307/2283253
    
    Munzel, U., & Brunner, E. (2002). An exact paired rank test. *Biometrical Journal, 44*(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9
    
    Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. *EDV in Medizin und Biologie, 18*(1), 12–19.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> di_wcdf(T=8, n=12)
    0.006103515625
    
    '''

    if method=='recursive':
        pVal=0
        for i in range(0, T+1):
            pVal = pVal + srf(i, n)
        return pVal/(2**n)
        
    elif method=='shift':
        max_rank = int(n*(n+1)/2)
        possible_values = range(0, max_rank+1)
        freqs = [1] + [0]*(max_rank)
    
        for i in range(1, n+1):
            shift = freqs[-(i % (max_rank+1)):] + freqs[:-(i % (max_rank+1))]
            freqs = [freqs[j] + shift[j] for j in range(max_rank + 1)]
            
        n_combs = sum(freqs)
        return sum(freqs[0:T+1])/n_combs

    elif method=='enumerate':
        rank_dist = [c for c in product(list('01'), repeat=n)]
        n_sums = 2**n
        
        for i in range(0, len(rank_dist)):
            numbers = [ int(x) for x in rank_dist[i]]
            rank_dist[i] = numbers 
        
        ranks = [i for i in range(1, n+1)]
        
        sum_rank_dist = []
        for j in range(n_sums):
            sum_temp = 0
            for i in range(n):
                sum_temp += ranks[i]*rank_dist[j][i]
            sum_rank_dist.append(sum_temp)
        
        sum_rank_max = int(n*(n+1)/2)
        
        sum_rank_freq = []
        for i in range(0, int(sum_rank_max) + 1):
            sum_rank_freq.append(sum_rank_dist.count(i))
        
        return sum(sum_rank_freq[0:T+1])/n_sums

def di_wpmf(T, n, method='shift'):
    '''
    Wilcoxon Probability Mass Function

    This function will give the probability of a sum of ranks of T, given a sample size of n.

    Some explanation on this distribution can be found in this [YouTube video](https://youtu.be/BKWnTJAp58E). This function is shown in this [YouTube video](https://youtu.be/6amg6Ug2awE) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/WilcoxonSignRank.html)
    
    Parameters
    ----------
    T : int
        the sum of ranks
    n : int
        the sample size
    method : {"shift", "enumerate", "recursive"}, optional
        the calculation method to use
    
    Returns
    -------
    float : the requested probability
    
    Notes
    -----
    The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability.

    The recursive method uses the formula from McCornack (1965, p. 864):
    $$srf\\left(x,y\\right)=\\begin{cases} 0 & x \\lt 0 \\\\ 0 &amp; x\\gt\\binom{y+1}{2} \\\\ 1 & y=1 \\wedge \\left( x=0\\vee x=1 \\right) \\\\ srf^*\\left(x,y\\right) & y\\geq 0 \\end{cases}$$ 
    
    with:
    $$srf^*\\left(x,y\\right) = srf\\left(x-y,y-1\\right) + srf\\left(x,y-1\\right)$$   

    The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.
    
    * Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
    * Create a vector with the value 1 followed by n times a 0.
    * Create a shifted vector by moving all values by 1.
    * Add the two results (the original and the shifted version)
    * This will be the updated vector
    * Shift the vector now by 2
    * Add the two results (the updated vector with and the two shifted version)
    * Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.
    
    See Also
    --------
    stikpetP.distributions.dist_wilcoxon.di_wcdf : for the cumulative distribution function
    
    References
    ----------
    McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. *Journal of the American Statistical Association, 60*(311), 864–871. doi:10.2307/2283253
    
    Munzel, U., & Brunner, E. (2002). An exact paired rank test. *Biometrical Journal, 44*(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9
    
    Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. *EDV in Medizin und Biologie, 18*(1), 12–19.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> di_wpmf(T=8, n=12)
    0.00146484375
    
    '''

    if method=='recursive':        
        return srf(T, n)/(2**n)
        
    elif method=='shift':
        max_rank = int(n*(n+1)/2)
        possible_values = range(0, max_rank+1)
        freqs = [1] + [0]*(max_rank)
    
        for i in range(1, n+1):
            shift = freqs[max_rank + 1 - i : max_rank + 2] + freqs[0 : max_rank + 1 - i]
            freqs = [freqs[j] + shift[j] for j in range(max_rank + 1)]
            
        n_combs = sum(freqs)
        return [i/n_combs for i in freqs][T]

    elif method=='enumerate':
        rank_dist = [c for c in product(list('01'), repeat=n)]
        n_sums = 2**n
        
        for i in range(0, len(rank_dist)):
            numbers = [ int(x) for x in rank_dist[i]]
            rank_dist[i] = numbers 
        
        ranks = [i for i in range(1, n+1)]
        
        sum_rank_dist = []
        for j in range(n_sums):
            sum_temp = 0
            for i in range(n):
                sum_temp += ranks[i]*rank_dist[j][i]
            sum_rank_dist.append(sum_temp)
        
        sum_rank_max = int(n*(n+1)/2)
        
        sum_rank_freq = []
        for i in range(0, int(sum_rank_max) + 1):
            sum_rank_freq.append(sum_rank_dist.count(i))
        
        return [i/n_sums for i in sum_rank_freq][T]

def srf(x,y): 
    '''
    sum rank frequency
    '''    
    if x<0:
        return 0
    elif x > comb(y+1, 2):
        return 0
    elif y==1 and (x==0 or x==1):
        return 1
    elif y>=0:
        return srf(x-y, y-1) + srf(x, y-1)

Functions

def di_wcdf(T, n, method='shift')

Wilcoxon Cumulative Distribution Function

This function will give the cumulative probability of a sum of ranks of T, given a sample size of n.

Some explanation on this distribution can be found in this YouTube video. This function is shown in this YouTube video and the test is also described at PeterStatistics.com

Parameters

T : int: the sum of ranks
n : int: the sample size
method : {"shift", "enumerate", "recursive"}, optional: the calculation method to use

Returns

float : the requested probability

Notes

The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability and cumulative probabilities.

The recursive method uses the formula from McCornack (1965, p. 864): $srf\left(x,y\right)=\begin{cases} 0 & x \lt 0 \\ 0 & x\gt\binom{y+1}{2} \\ 1 & y=1 \wedge \left( x=0\vee x=1 \right) \\ srf^*\left(x,y\right) & y\geq 0 \end{cases}$

with: $srf^*\left(x,y\right) = srf\left(x-y,y-1\right) + srf\left(x,y-1\right)$

The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.

Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
Create a vector with the value 1 followed by n times a 0.
Create a shifted vector by moving all values by 1.
Add the two results (the original and the shifted version)
This will be the updated vector
Shift the vector now by 2
Add the two results (the updated vector with and the two shifted version)
Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.

This function is used by ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)

References

McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. Journal of the American Statistical Association, 60(311), 864–871. doi:10.2307/2283253

Munzel, U., & Brunner, E. (2002). An exact paired rank test. Biometrical Journal, 44(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9

Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. EDV in Medizin und Biologie, 18(1), 12–19.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

>>> di_wcdf(T=8, n=12)
0.006103515625

Expand source code

def di_wcdf(T, n, method='shift'):
    '''
    Wilcoxon Cumulative Distribution Function

    This function will give the cumulative probability of a sum of ranks of T, given a sample size of n.

    Some explanation on this distribution can be found in this [YouTube video](https://youtu.be/BKWnTJAp58E). This function is shown in this [YouTube video](https://youtu.be/6amg6Ug2awE) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/WilcoxonSignRank.html)
    
    Parameters
    ----------
    T : int
        the sum of ranks
    n : int
        the sample size
    method : {"shift", "enumerate", "recursive"}, optional
        the calculation method to use
    
    Returns
    -------
    float : the requested probability
    
    Notes
    -----
    The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability and cumulative probabilities.

    The recursive method uses the formula from McCornack (1965, p. 864):
    $$srf\\left(x,y\\right)=\\begin{cases} 0 & x \\lt 0 \\\\ 0 &amp; x\\gt\\binom{y+1}{2} \\\\ 1 & y=1 \\wedge \\left( x=0\\vee x=1 \\right) \\\\ srf^*\\left(x,y\\right) & y\\geq 0 \\end{cases}$$ 
    
    with:
    $$srf^*\\left(x,y\\right) = srf\\left(x-y,y-1\\right) + srf\\left(x,y-1\\right)$$   

    The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.
    
    * Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
    * Create a vector with the value 1 followed by n times a 0.
    * Create a shifted vector by moving all values by 1.
    * Add the two results (the original and the shifted version)
    * This will be the updated vector
    * Shift the vector now by 2
    * Add the two results (the updated vector with and the two shifted version)
    * Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.

    This function is used by [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
    
    See Also
    --------
    stikpetP.distributions.dist_wilcoxon.di_wpmf : for the probability mass function    
    
    References
    ----------
    McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. *Journal of the American Statistical Association, 60*(311), 864–871. doi:10.2307/2283253
    
    Munzel, U., & Brunner, E. (2002). An exact paired rank test. *Biometrical Journal, 44*(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9
    
    Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. *EDV in Medizin und Biologie, 18*(1), 12–19.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> di_wcdf(T=8, n=12)
    0.006103515625
    
    '''

    if method=='recursive':
        pVal=0
        for i in range(0, T+1):
            pVal = pVal + srf(i, n)
        return pVal/(2**n)
        
    elif method=='shift':
        max_rank = int(n*(n+1)/2)
        possible_values = range(0, max_rank+1)
        freqs = [1] + [0]*(max_rank)
    
        for i in range(1, n+1):
            shift = freqs[-(i % (max_rank+1)):] + freqs[:-(i % (max_rank+1))]
            freqs = [freqs[j] + shift[j] for j in range(max_rank + 1)]
            
        n_combs = sum(freqs)
        return sum(freqs[0:T+1])/n_combs

    elif method=='enumerate':
        rank_dist = [c for c in product(list('01'), repeat=n)]
        n_sums = 2**n
        
        for i in range(0, len(rank_dist)):
            numbers = [ int(x) for x in rank_dist[i]]
            rank_dist[i] = numbers 
        
        ranks = [i for i in range(1, n+1)]
        
        sum_rank_dist = []
        for j in range(n_sums):
            sum_temp = 0
            for i in range(n):
                sum_temp += ranks[i]*rank_dist[j][i]
            sum_rank_dist.append(sum_temp)
        
        sum_rank_max = int(n*(n+1)/2)
        
        sum_rank_freq = []
        for i in range(0, int(sum_rank_max) + 1):
            sum_rank_freq.append(sum_rank_dist.count(i))
        
        return sum(sum_rank_freq[0:T+1])/n_sums

def di_wpmf(T, n, method='shift')

Wilcoxon Probability Mass Function

This function will give the probability of a sum of ranks of T, given a sample size of n.

Some explanation on this distribution can be found in this YouTube video. This function is shown in this YouTube video and the test is also described at PeterStatistics.com

Parameters

T : int: the sum of ranks
n : int: the sample size
method : {"shift", "enumerate", "recursive"}, optional: the calculation method to use

Returns

float : the requested probability

Notes

with: $srf^*\left(x,y\right) = srf\left(x-y,y-1\right) + srf\left(x,y-1\right)$

The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.

Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
Create a vector with the value 1 followed by n times a 0.
Create a shifted vector by moving all values by 1.
Add the two results (the original and the shifted version)
This will be the updated vector
Shift the vector now by 2
Add the two results (the updated vector with and the two shifted version)
Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.

References

McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. Journal of the American Statistical Association, 60(311), 864–871. doi:10.2307/2283253

Munzel, U., & Brunner, E. (2002). An exact paired rank test. Biometrical Journal, 44(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9

Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. EDV in Medizin und Biologie, 18(1), 12–19.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

>>> di_wpmf(T=8, n=12)
0.00146484375

Expand source code

def di_wpmf(T, n, method='shift'):
    '''
    Wilcoxon Probability Mass Function

    This function will give the probability of a sum of ranks of T, given a sample size of n.

    Some explanation on this distribution can be found in this [YouTube video](https://youtu.be/BKWnTJAp58E). This function is shown in this [YouTube video](https://youtu.be/6amg6Ug2awE) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/WilcoxonSignRank.html)
    
    Parameters
    ----------
    T : int
        the sum of ranks
    n : int
        the sample size
    method : {"shift", "enumerate", "recursive"}, optional
        the calculation method to use
    
    Returns
    -------
    float : the requested probability
    
    Notes
    -----
    The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability.

    The recursive method uses the formula from McCornack (1965, p. 864):
    $$srf\\left(x,y\\right)=\\begin{cases} 0 & x \\lt 0 \\\\ 0 &amp; x\\gt\\binom{y+1}{2} \\\\ 1 & y=1 \\wedge \\left( x=0\\vee x=1 \\right) \\\\ srf^*\\left(x,y\\right) & y\\geq 0 \\end{cases}$$ 
    
    with:
    $$srf^*\\left(x,y\\right) = srf\\left(x-y,y-1\\right) + srf\\left(x,y-1\\right)$$   

    The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.
    
    * Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
    * Create a vector with the value 1 followed by n times a 0.
    * Create a shifted vector by moving all values by 1.
    * Add the two results (the original and the shifted version)
    * This will be the updated vector
    * Shift the vector now by 2
    * Add the two results (the updated vector with and the two shifted version)
    * Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.
    
    See Also
    --------
    stikpetP.distributions.dist_wilcoxon.di_wcdf : for the cumulative distribution function
    
    References
    ----------
    McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. *Journal of the American Statistical Association, 60*(311), 864–871. doi:10.2307/2283253
    
    Munzel, U., & Brunner, E. (2002). An exact paired rank test. *Biometrical Journal, 44*(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9
    
    Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. *EDV in Medizin und Biologie, 18*(1), 12–19.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> di_wpmf(T=8, n=12)
    0.00146484375
    
    '''

    if method=='recursive':        
        return srf(T, n)/(2**n)
        
    elif method=='shift':
        max_rank = int(n*(n+1)/2)
        possible_values = range(0, max_rank+1)
        freqs = [1] + [0]*(max_rank)
    
        for i in range(1, n+1):
            shift = freqs[max_rank + 1 - i : max_rank + 2] + freqs[0 : max_rank + 1 - i]
            freqs = [freqs[j] + shift[j] for j in range(max_rank + 1)]
            
        n_combs = sum(freqs)
        return [i/n_combs for i in freqs][T]

    elif method=='enumerate':
        rank_dist = [c for c in product(list('01'), repeat=n)]
        n_sums = 2**n
        
        for i in range(0, len(rank_dist)):
            numbers = [ int(x) for x in rank_dist[i]]
            rank_dist[i] = numbers 
        
        ranks = [i for i in range(1, n+1)]
        
        sum_rank_dist = []
        for j in range(n_sums):
            sum_temp = 0
            for i in range(n):
                sum_temp += ranks[i]*rank_dist[j][i]
            sum_rank_dist.append(sum_temp)
        
        sum_rank_max = int(n*(n+1)/2)
        
        sum_rank_freq = []
        for i in range(0, int(sum_rank_max) + 1):
            sum_rank_freq.append(sum_rank_dist.count(i))
        
        return [i/n_sums for i in sum_rank_freq][T]

def srf(x, y)

sum rank frequency

Expand source code

def srf(x,y): 
    '''
    sum rank frequency
    '''    
    if x<0:
        return 0
    elif x > comb(y+1, 2):
        return 0
    elif y==1 and (x==0 or x==1):
        return 1
    elif y>=0:
        return srf(x-y, y-1) + srf(x, y-1)