Module stikpetP.distributions.dist_wilcoxon
Expand source code
from math import comb
from itertools import product
def di_wcdf(T, n, method='shift'):
'''
Wilcoxon Cumulative Distribution Function
This function will give the cumulative probability of a sum of ranks of T, given a sample size of n.
Some explanation on this distribution can be found in this [YouTube video](https://youtu.be/BKWnTJAp58E). This function is shown in this [YouTube video](https://youtu.be/6amg6Ug2awE) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/WilcoxonSignRank.html)
Parameters
----------
T : int
the sum of ranks
n : int
the sample size
method : {"shift", "enumerate", "recursive"}, optional
the calculation method to use
Returns
-------
float : the requested probability
Notes
-----
The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability and cumulative probabilities.
The recursive method uses the formula from McCornack (1965, p. 864):
$$srf\\left(x,y\\right)=\\begin{cases} 0 & x \\lt 0 \\\\ 0 & x\\gt\\binom{y+1}{2} \\\\ 1 & y=1 \\wedge \\left( x=0\\vee x=1 \\right) \\\\ srf^*\\left(x,y\\right) & y\\geq 0 \\end{cases}$$
with:
$$srf^*\\left(x,y\\right) = srf\\left(x-y,y-1\\right) + srf\\left(x,y-1\\right)$$
The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.
* Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
* Create a vector with the value 1 followed by n times a 0.
* Create a shifted vector by moving all values by 1.
* Add the two results (the original and the shifted version)
* This will be the updated vector
* Shift the vector now by 2
* Add the two results (the updated vector with and the two shifted version)
* Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.
This function is used by [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
See Also
--------
stikpetP.distributions.dist_wilcoxon.di_wpmf : for the probability mass function
References
----------
McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. *Journal of the American Statistical Association, 60*(311), 864–871. doi:10.2307/2283253
Munzel, U., & Brunner, E. (2002). An exact paired rank test. *Biometrical Journal, 44*(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9
Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. *EDV in Medizin und Biologie, 18*(1), 12–19.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> di_wcdf(T=8, n=12)
0.006103515625
'''
if method=='recursive':
pVal=0
for i in range(0, T+1):
pVal = pVal + srf(i, n)
return pVal/(2**n)
elif method=='shift':
max_rank = int(n*(n+1)/2)
possible_values = range(0, max_rank+1)
freqs = [1] + [0]*(max_rank)
for i in range(1, n+1):
shift = freqs[-(i % (max_rank+1)):] + freqs[:-(i % (max_rank+1))]
freqs = [freqs[j] + shift[j] for j in range(max_rank + 1)]
n_combs = sum(freqs)
return sum(freqs[0:T+1])/n_combs
elif method=='enumerate':
rank_dist = [c for c in product(list('01'), repeat=n)]
n_sums = 2**n
for i in range(0, len(rank_dist)):
numbers = [ int(x) for x in rank_dist[i]]
rank_dist[i] = numbers
ranks = [i for i in range(1, n+1)]
sum_rank_dist = []
for j in range(n_sums):
sum_temp = 0
for i in range(n):
sum_temp += ranks[i]*rank_dist[j][i]
sum_rank_dist.append(sum_temp)
sum_rank_max = int(n*(n+1)/2)
sum_rank_freq = []
for i in range(0, int(sum_rank_max) + 1):
sum_rank_freq.append(sum_rank_dist.count(i))
return sum(sum_rank_freq[0:T+1])/n_sums
def di_wpmf(T, n, method='shift'):
'''
Wilcoxon Probability Mass Function
This function will give the probability of a sum of ranks of T, given a sample size of n.
Some explanation on this distribution can be found in this [YouTube video](https://youtu.be/BKWnTJAp58E). This function is shown in this [YouTube video](https://youtu.be/6amg6Ug2awE) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/WilcoxonSignRank.html)
Parameters
----------
T : int
the sum of ranks
n : int
the sample size
method : {"shift", "enumerate", "recursive"}, optional
the calculation method to use
Returns
-------
float : the requested probability
Notes
-----
The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability.
The recursive method uses the formula from McCornack (1965, p. 864):
$$srf\\left(x,y\\right)=\\begin{cases} 0 & x \\lt 0 \\\\ 0 & x\\gt\\binom{y+1}{2} \\\\ 1 & y=1 \\wedge \\left( x=0\\vee x=1 \\right) \\\\ srf^*\\left(x,y\\right) & y\\geq 0 \\end{cases}$$
with:
$$srf^*\\left(x,y\\right) = srf\\left(x-y,y-1\\right) + srf\\left(x,y-1\\right)$$
The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.
* Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
* Create a vector with the value 1 followed by n times a 0.
* Create a shifted vector by moving all values by 1.
* Add the two results (the original and the shifted version)
* This will be the updated vector
* Shift the vector now by 2
* Add the two results (the updated vector with and the two shifted version)
* Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.
See Also
--------
stikpetP.distributions.dist_wilcoxon.di_wcdf : for the cumulative distribution function
References
----------
McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. *Journal of the American Statistical Association, 60*(311), 864–871. doi:10.2307/2283253
Munzel, U., & Brunner, E. (2002). An exact paired rank test. *Biometrical Journal, 44*(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9
Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. *EDV in Medizin und Biologie, 18*(1), 12–19.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> di_wpmf(T=8, n=12)
0.00146484375
'''
if method=='recursive':
return srf(T, n)/(2**n)
elif method=='shift':
max_rank = int(n*(n+1)/2)
possible_values = range(0, max_rank+1)
freqs = [1] + [0]*(max_rank)
for i in range(1, n+1):
shift = freqs[max_rank + 1 - i : max_rank + 2] + freqs[0 : max_rank + 1 - i]
freqs = [freqs[j] + shift[j] for j in range(max_rank + 1)]
n_combs = sum(freqs)
return [i/n_combs for i in freqs][T]
elif method=='enumerate':
rank_dist = [c for c in product(list('01'), repeat=n)]
n_sums = 2**n
for i in range(0, len(rank_dist)):
numbers = [ int(x) for x in rank_dist[i]]
rank_dist[i] = numbers
ranks = [i for i in range(1, n+1)]
sum_rank_dist = []
for j in range(n_sums):
sum_temp = 0
for i in range(n):
sum_temp += ranks[i]*rank_dist[j][i]
sum_rank_dist.append(sum_temp)
sum_rank_max = int(n*(n+1)/2)
sum_rank_freq = []
for i in range(0, int(sum_rank_max) + 1):
sum_rank_freq.append(sum_rank_dist.count(i))
return [i/n_sums for i in sum_rank_freq][T]
def srf(x,y):
'''
sum rank frequency
'''
if x<0:
return 0
elif x > comb(y+1, 2):
return 0
elif y==1 and (x==0 or x==1):
return 1
elif y>=0:
return srf(x-y, y-1) + srf(x, y-1)
Functions
def di_wcdf(T, n, method='shift')
-
Wilcoxon Cumulative Distribution Function
This function will give the cumulative probability of a sum of ranks of T, given a sample size of n.
Some explanation on this distribution can be found in this YouTube video. This function is shown in this YouTube video and the test is also described at PeterStatistics.com
Parameters
T
:int
- the sum of ranks
n
:int
- the sample size
method
:{"shift", "enumerate", "recursive"}
, optional- the calculation method to use
Returns
float
:the requested probability
Notes
The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability and cumulative probabilities.
The recursive method uses the formula from McCornack (1965, p. 864): srf\left(x,y\right)=\begin{cases} 0 & x \lt 0 \\ 0 & x\gt\binom{y+1}{2} \\ 1 & y=1 \wedge \left( x=0\vee x=1 \right) \\ srf^*\left(x,y\right) & y\geq 0 \end{cases}
with: srf^*\left(x,y\right) = srf\left(x-y,y-1\right) + srf\left(x,y-1\right)
The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.
- Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
- Create a vector with the value 1 followed by n times a 0.
- Create a shifted vector by moving all values by 1.
- Add the two results (the original and the shifted version)
- This will be the updated vector
- Shift the vector now by 2
- Add the two results (the updated vector with and the two shifted version)
- Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.
This function is used by ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)
See Also
di_wpmf()
- for the probability mass function
References
McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. Journal of the American Statistical Association, 60(311), 864–871. doi:10.2307/2283253
Munzel, U., & Brunner, E. (2002). An exact paired rank test. Biometrical Journal, 44(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9
Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. EDV in Medizin und Biologie, 18(1), 12–19.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> di_wcdf(T=8, n=12) 0.006103515625
Expand source code
def di_wcdf(T, n, method='shift'): ''' Wilcoxon Cumulative Distribution Function This function will give the cumulative probability of a sum of ranks of T, given a sample size of n. Some explanation on this distribution can be found in this [YouTube video](https://youtu.be/BKWnTJAp58E). This function is shown in this [YouTube video](https://youtu.be/6amg6Ug2awE) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/WilcoxonSignRank.html) Parameters ---------- T : int the sum of ranks n : int the sample size method : {"shift", "enumerate", "recursive"}, optional the calculation method to use Returns ------- float : the requested probability Notes ----- The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability and cumulative probabilities. The recursive method uses the formula from McCornack (1965, p. 864): $$srf\\left(x,y\\right)=\\begin{cases} 0 & x \\lt 0 \\\\ 0 & x\\gt\\binom{y+1}{2} \\\\ 1 & y=1 \\wedge \\left( x=0\\vee x=1 \\right) \\\\ srf^*\\left(x,y\\right) & y\\geq 0 \\end{cases}$$ with: $$srf^*\\left(x,y\\right) = srf\\left(x-y,y-1\\right) + srf\\left(x,y-1\\right)$$ The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows. * Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2 * Create a vector with the value 1 followed by n times a 0. * Create a shifted vector by moving all values by 1. * Add the two results (the original and the shifted version) * This will be the updated vector * Shift the vector now by 2 * Add the two results (the updated vector with and the two shifted version) * Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done. This function is used by [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample) See Also -------- stikpetP.distributions.dist_wilcoxon.di_wpmf : for the probability mass function References ---------- McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. *Journal of the American Statistical Association, 60*(311), 864–871. doi:10.2307/2283253 Munzel, U., & Brunner, E. (2002). An exact paired rank test. *Biometrical Journal, 44*(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9 Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. *EDV in Medizin und Biologie, 18*(1), 12–19. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> di_wcdf(T=8, n=12) 0.006103515625 ''' if method=='recursive': pVal=0 for i in range(0, T+1): pVal = pVal + srf(i, n) return pVal/(2**n) elif method=='shift': max_rank = int(n*(n+1)/2) possible_values = range(0, max_rank+1) freqs = [1] + [0]*(max_rank) for i in range(1, n+1): shift = freqs[-(i % (max_rank+1)):] + freqs[:-(i % (max_rank+1))] freqs = [freqs[j] + shift[j] for j in range(max_rank + 1)] n_combs = sum(freqs) return sum(freqs[0:T+1])/n_combs elif method=='enumerate': rank_dist = [c for c in product(list('01'), repeat=n)] n_sums = 2**n for i in range(0, len(rank_dist)): numbers = [ int(x) for x in rank_dist[i]] rank_dist[i] = numbers ranks = [i for i in range(1, n+1)] sum_rank_dist = [] for j in range(n_sums): sum_temp = 0 for i in range(n): sum_temp += ranks[i]*rank_dist[j][i] sum_rank_dist.append(sum_temp) sum_rank_max = int(n*(n+1)/2) sum_rank_freq = [] for i in range(0, int(sum_rank_max) + 1): sum_rank_freq.append(sum_rank_dist.count(i)) return sum(sum_rank_freq[0:T+1])/n_sums
def di_wpmf(T, n, method='shift')
-
Wilcoxon Probability Mass Function
This function will give the probability of a sum of ranks of T, given a sample size of n.
Some explanation on this distribution can be found in this YouTube video. This function is shown in this YouTube video and the test is also described at PeterStatistics.com
Parameters
T
:int
- the sum of ranks
n
:int
- the sample size
method
:{"shift", "enumerate", "recursive"}
, optional- the calculation method to use
Returns
float
:the requested probability
Notes
The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability.
The recursive method uses the formula from McCornack (1965, p. 864): srf\left(x,y\right)=\begin{cases} 0 & x \lt 0 \\ 0 & x\gt\binom{y+1}{2} \\ 1 & y=1 \wedge \left( x=0\vee x=1 \right) \\ srf^*\left(x,y\right) & y\geq 0 \end{cases}
with: srf^*\left(x,y\right) = srf\left(x-y,y-1\right) + srf\left(x,y-1\right)
The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows.
- Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2
- Create a vector with the value 1 followed by n times a 0.
- Create a shifted vector by moving all values by 1.
- Add the two results (the original and the shifted version)
- This will be the updated vector
- Shift the vector now by 2
- Add the two results (the updated vector with and the two shifted version)
- Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done.
See Also
di_wcdf()
- for the cumulative distribution function
References
McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. Journal of the American Statistical Association, 60(311), 864–871. doi:10.2307/2283253
Munzel, U., & Brunner, E. (2002). An exact paired rank test. Biometrical Journal, 44(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9
Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. EDV in Medizin und Biologie, 18(1), 12–19.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> di_wpmf(T=8, n=12) 0.00146484375
Expand source code
def di_wpmf(T, n, method='shift'): ''' Wilcoxon Probability Mass Function This function will give the probability of a sum of ranks of T, given a sample size of n. Some explanation on this distribution can be found in this [YouTube video](https://youtu.be/BKWnTJAp58E). This function is shown in this [YouTube video](https://youtu.be/6amg6Ug2awE) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/WilcoxonSignRank.html) Parameters ---------- T : int the sum of ranks n : int the sample size method : {"shift", "enumerate", "recursive"}, optional the calculation method to use Returns ------- float : the requested probability Notes ----- The enumeration method will create all possible combinations of ranks 1 to n, sum each of these, and then determines the count of each unique sum of ranks. It then uses this to determine the probability. The recursive method uses the formula from McCornack (1965, p. 864): $$srf\\left(x,y\\right)=\\begin{cases} 0 & x \\lt 0 \\\\ 0 & x\\gt\\binom{y+1}{2} \\\\ 1 & y=1 \\wedge \\left( x=0\\vee x=1 \\right) \\\\ srf^*\\left(x,y\\right) & y\\geq 0 \\end{cases}$$ with: $$srf^*\\left(x,y\\right) = srf\\left(x-y,y-1\\right) + srf\\left(x,y-1\\right)$$ The shift-algorithm from Streitberg and Röhmel (1987), and can also be found in Munzel and Brunner (2002). This works as follows. * Start with listing all values from 0 to the maximum possible sum of ranks, so 0 to (n×(n+1))/2 * Create a vector with the value 1 followed by n times a 0. * Create a shifted vector by moving all values by 1. * Add the two results (the original and the shifted version) * This will be the updated vector * Shift the vector now by 2 * Add the two results (the updated vector with and the two shifted version) * Repeat these steps each time shifting by one more than the previous. Stop when n-times shifting has been done. See Also -------- stikpetP.distributions.dist_wilcoxon.di_wcdf : for the cumulative distribution function References ---------- McCornack, R. L. (1965). Extended tables of the Wilcoxon matched pair signed rank statistic. *Journal of the American Statistical Association, 60*(311), 864–871. doi:10.2307/2283253 Munzel, U., & Brunner, E. (2002). An exact paired rank test. *Biometrical Journal, 44*(5), 584. doi:10.1002/1521-4036(200207)44:5<584::AID-BIMJ584>3.0.CO;2-9 Streitberg, B., & Röhmel, J. (1987). Exakte Verteilungen für Rang-und Randomisierungstests im allgemeinen c-Stichprobenproblem. *EDV in Medizin und Biologie, 18*(1), 12–19. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> di_wpmf(T=8, n=12) 0.00146484375 ''' if method=='recursive': return srf(T, n)/(2**n) elif method=='shift': max_rank = int(n*(n+1)/2) possible_values = range(0, max_rank+1) freqs = [1] + [0]*(max_rank) for i in range(1, n+1): shift = freqs[max_rank + 1 - i : max_rank + 2] + freqs[0 : max_rank + 1 - i] freqs = [freqs[j] + shift[j] for j in range(max_rank + 1)] n_combs = sum(freqs) return [i/n_combs for i in freqs][T] elif method=='enumerate': rank_dist = [c for c in product(list('01'), repeat=n)] n_sums = 2**n for i in range(0, len(rank_dist)): numbers = [ int(x) for x in rank_dist[i]] rank_dist[i] = numbers ranks = [i for i in range(1, n+1)] sum_rank_dist = [] for j in range(n_sums): sum_temp = 0 for i in range(n): sum_temp += ranks[i]*rank_dist[j][i] sum_rank_dist.append(sum_temp) sum_rank_max = int(n*(n+1)/2) sum_rank_freq = [] for i in range(0, int(sum_rank_max) + 1): sum_rank_freq.append(sum_rank_dist.count(i)) return [i/n_sums for i in sum_rank_freq][T]
def srf(x, y)
-
sum rank frequency
Expand source code
def srf(x,y): ''' sum rank frequency ''' if x<0: return 0 elif x > comb(y+1, 2): return 0 elif y==1 and (x==0 or x==1): return 1 elif y>=0: return srf(x-y, y-1) + srf(x, y-1)