Module stikpetP.distributions.dist_multinomial
Expand source code
from math import gamma, prod, factorial, lgamma, log, exp
def di_mpmf(F, P, method='loggamma'):
'''
Multinomial Probability Mass Function
-----------------------------
This is a function for the multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P. It is a generalization of the binomial distribution.
The distribution is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/Multinomial.html)
Parameters
----------
F : list
the observed counts
P : list
the probabilities for each category
method : {"loggamma", "factorial", "gamma", "mprob"}, optional
calculation method to use
Returns
-------
mpmf : the probability
Notes
-----
If *method=factorial* the following formula is used:
$$mpmf\\left(F, P\\right) = \\frac{n!}{\\prod_{i=1}^{k} \\left(F_i!\\right)} \\times \\prod_{i=1}^{k} P_i^{F_i}$$
This formula was most likely already used by for example Edgeworth (1905), but can for example also be found in Berry and Mielke (1995, p. 769).
If *method=gamma*:
$$mpmf\\left(F, P\\right) = \\frac{\\Gamma\\left(1+n\\right)}{\\prod_{i=1}^n \\Gamma\\left(1+F_i\\right)} \\times \\prod_{i=1}^{k} P_i^{F_i}$$
If *method=loggamma*:
$$mpmf\\left(F, P\\right) = e^{\\ln\\left(mpmf\\left(F, P\\right)\\right)}$$
$$\\ln\\left(mpmf\\left(F, P\\right)\\right) = \\ln\\left(\\Gamma\\left(n + 1\\right)\\right) + \\sum_{i=1}^k F_i\\times\\ln\\left(P_i\\right) - \\ln\\left(\\Gamma\\left(F_i + 1\\right)\\right)$$
This formula can for example be found in Arnold (2018).
If *method=mprob* the algorithm from García-Pérez (1999) is used:
1. Determine $F^{\\*}$, the counts in descending order, and move the elements in $P$ accordingly creating $P^{\\*}$.
2. Set $pmf = 1$, $t=P_1{\\*}$, $i=2$, $x=0$, and $m=F_1^{\\*}$
3. Set $l = F_i^{\\*}$. For $r=1$ to $l$ do:
* update $x = x + 1$
* if $x > F_1^{\\*}$ then set $t = 1$ (else nothing)
* update $pmf = pmf\\times t\\times P_i^{\\*}\\times \\frac{r + m}{r}$
4. If $i=k$, then go to step 5, otherwise update $i = i + 1$, $m = m + F_i^{\\*}$ and go to step 3
5. If $x < F_1^{\\*}$ then for $r=x + 1$ to $F_1^{\\*}$ update $pmf = pmf\times P_1^{\\*}$
This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for this, but it uses the multinomial function from the scipy.stats library.
References
----------
Arnold, J. (2018, December 3). Maximum Likelihood for the multinomial distribution (bag of words) [Blog]. Jakuba. https://blog.jakuba.net/maximum-likelihood-for-multinomial-distribution/
Berry, K. J., & Mielke, P. W. (1995). Exact cumulative probabilities for the multinomial distribution. *Educational and Psychological Measurement, 55*(5), 769–772. doi:10.1177/0013164495055005008
Edgeworth, F. Y. (1905). The law of error. *Transactions of the Cambridge Philosophical Society, 20*, 36–66.
García-Pérez, M. A. (1999). MPROB: Computation of multinomial probabilities. *Behavior Research Methods, Instruments, & Computers, 31*(4), 701–705. doi:10.3758/BF03200749
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> freq = [3, 6, 2, 9]
>>> prob = [0.2, 0.3, 0.1, 0.4]
>>> di_mpmf(freq, prob)
0.011863293601775638
'''
k = len(F)
n = sum(F)
if method=='factorial':
mpmf = factorial(n)/prod([factorial(i) for i in F]) * prod([P[i]**F[i] for i in range(k)])
elif method=='gamma':
mpmf = gamma(1 + n)/prod([gamma(F[i]+1) for i in range(k)]) * prod([P[i]**F[i] for i in range(k)])
elif method=='loggamma':
mpmf = exp(lgamma(n+1) + sum([F[i]*log(P[i]) - lgamma(F[i]+1) for i in range(k)]))
elif method=='mprob':
#combine F and p
pair = list(zip(F, P))
#sort on F in descending order
sort_pair = sorted(pair, key=lambda x: x[0], reverse=True)
#split back
F_s, P_s = zip(*sort_pair)
#convert to lists again
F_s = list(F_s)
P_s = list(P_s)
mpmf = 1
t = P_s[0]
i = 2
x = 0
m = F_s[0]
#loop from 2 to k
for i in range(2, k+1):
#set l
l = F_s[i-1]
#for r from 1 to l
for r in range(1, l+1):
#update x
x = x + 1
#check x greater than F_0^*
if x > F_s[0]:
t = 1
#update pmf
mpmf = mpmf*t*P_s[i-1]*(r+m)/r
#update m
m = m + F_s[i-1]
if x < F_s[0]:
for r in range(x+1, F_s[0]+1):
mpmf = mpmf*P_s[0]
return mpmf
def di_mcdf(F, P, method='loggamma'):
'''
Multinomial Cumulative Distribution Function
-----------------------------
This is a function for the cumulative multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P, or a distribution even more rare. It is a generalization of the binomial distribution.
The distribution is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/Multinomial.html)
Parameters
----------
F : list
the observed counts
P : list
the probabilities for each category
method : {"loggamma", "factorial", "gamma"}, optional
calculation method to use
Returns
-------
mcdf : the probability
Notes
-----
The function first determines all possible arrangements over k categories that sum to n, using the **find_combinations()** function. It then uses the **di_pmf()** function to determine the probability for each of these, and sums those that are less or equal to the sample version.
This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for this, but it uses the multinomial function from the scipy.stats library.
See Also
--------
stikpetP.distributions.dist_multinomial.di_mpmf : has details on the calculations for the different methods.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> freq = [3, 6, 2, 9]
>>> prob = [0.2, 0.3, 0.1, 0.4]
>>> di_mcdf(freq, prob)
0.9747905010962287
'''
n = sum(F)
k = len(F)
mpmf = di_mpmf(F, P, method)
mcdf = 0
combinations = he_find_combinations(n,k)
for i in combinations:
pmf = di_mpmf(i, P, method)
if pmf <= mpmf:
mcdf = mcdf + pmf
return mcdf
def he_find_combinations(n, k):
'''
Find Combinations
-----------------------------
Helper function for the multinomial cumulative distribution. Will return all possible combinations to distribute n items over k categories.
Parameters
----------
n : int
the sample size
k : int
the number of categories
Returns
-------
result : list where each element is a list showing a unique combination
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
def helper(n, k, current_combination, result):
if k == 0:
if n == 0:
result.append(current_combination)
return
for i in range(n + 1):
helper(n - i, k - 1, current_combination + [i], result)
result = []
helper(n, k, [], result)
return result
Functions
def di_mcdf(F, P, method='loggamma')
-
Multinomial Cumulative Distribution Function
This is a function for the cumulative multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P, or a distribution even more rare. It is a generalization of the binomial distribution.
The distribution is also described at PeterStatistics.com
Parameters
F
:list
- the observed counts
P
:list
- the probabilities for each category
method
:{"loggamma", "factorial", "gamma"}
, optional- calculation method to use
Returns
mcdf
:the probability
Notes
The function first determines all possible arrangements over k categories that sum to n, using the find_combinations() function. It then uses the di_pmf() function to determine the probability for each of these, and sums those that are less or equal to the sample version.
This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function ts_multinomial_gof for this, but it uses the multinomial function from the scipy.stats library.
See Also
di_mpmf()
- has details on the calculations for the different methods.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> freq = [3, 6, 2, 9] >>> prob = [0.2, 0.3, 0.1, 0.4] >>> di_mcdf(freq, prob) 0.9747905010962287
Expand source code
def di_mcdf(F, P, method='loggamma'): ''' Multinomial Cumulative Distribution Function ----------------------------- This is a function for the cumulative multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P, or a distribution even more rare. It is a generalization of the binomial distribution. The distribution is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/Multinomial.html) Parameters ---------- F : list the observed counts P : list the probabilities for each category method : {"loggamma", "factorial", "gamma"}, optional calculation method to use Returns ------- mcdf : the probability Notes ----- The function first determines all possible arrangements over k categories that sum to n, using the **find_combinations()** function. It then uses the **di_pmf()** function to determine the probability for each of these, and sums those that are less or equal to the sample version. This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for this, but it uses the multinomial function from the scipy.stats library. See Also -------- stikpetP.distributions.dist_multinomial.di_mpmf : has details on the calculations for the different methods. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> freq = [3, 6, 2, 9] >>> prob = [0.2, 0.3, 0.1, 0.4] >>> di_mcdf(freq, prob) 0.9747905010962287 ''' n = sum(F) k = len(F) mpmf = di_mpmf(F, P, method) mcdf = 0 combinations = he_find_combinations(n,k) for i in combinations: pmf = di_mpmf(i, P, method) if pmf <= mpmf: mcdf = mcdf + pmf return mcdf
def di_mpmf(F, P, method='loggamma')
-
Multinomial Probability Mass Function
This is a function for the multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P. It is a generalization of the binomial distribution.
The distribution is also described at PeterStatistics.com
Parameters
F
:list
- the observed counts
P
:list
- the probabilities for each category
method
:{"loggamma", "factorial", "gamma", "mprob"}
, optional- calculation method to use
Returns
mpmf
:the probability
Notes
If method=factorial the following formula is used:
mpmf\left(F, P\right) = \frac{n!}{\prod_{i=1}^{k} \left(F_i!\right)} \times \prod_{i=1}^{k} P_i^{F_i}This formula was most likely already used by for example Edgeworth (1905), but can for example also be found in Berry and Mielke (1995, p. 769).
If method=gamma: mpmf\left(F, P\right) = \frac{\Gamma\left(1+n\right)}{\prod_{i=1}^n \Gamma\left(1+F_i\right)} \times \prod_{i=1}^{k} P_i^{F_i}
If method=loggamma: mpmf\left(F, P\right) = e^{\ln\left(mpmf\left(F, P\right)\right)} \ln\left(mpmf\left(F, P\right)\right) = \ln\left(\Gamma\left(n + 1\right)\right) + \sum_{i=1}^k F_i\times\ln\left(P_i\right) - \ln\left(\Gamma\left(F_i + 1\right)\right)
This formula can for example be found in Arnold (2018).
If method=mprob the algorithm from García-Pérez (1999) is used:
- Determine $F^{*}$, the counts in descending order, and move the elements in $P$ accordingly creating $P^{*}$.
- Set $pmf = 1$, $t=P_1{*}$, $i=2$, $x=0$, and $m=F_1^{*}$
- Set $l = F_i^{*}$. For $r=1$ to $l$ do:
- update $x = x + 1$
- if $x > F_1^{*}$ then set $t = 1$ (else nothing)
- update $pmf = pmf\times t\times P_i^{*}\times \frac{r + m}{r}$
- If $i=k$, then go to step 5, otherwise update $i = i + 1$, $m = m + F_i^{*}$ and go to step 3
- If $x < F_1^{*}$ then for $r=x + 1$ to $F_1^{*}$ update $pmf = pmf imes P_1^{*}$
This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function ts_multinomial_gof for this, but it uses the multinomial function from the scipy.stats library.
References
Arnold, J. (2018, December 3). Maximum Likelihood for the multinomial distribution (bag of words) [Blog]. Jakuba. https://blog.jakuba.net/maximum-likelihood-for-multinomial-distribution/
Berry, K. J., & Mielke, P. W. (1995). Exact cumulative probabilities for the multinomial distribution. Educational and Psychological Measurement, 55(5), 769–772. doi:10.1177/0013164495055005008
Edgeworth, F. Y. (1905). The law of error. Transactions of the Cambridge Philosophical Society, 20, 36–66.
García-Pérez, M. A. (1999). MPROB: Computation of multinomial probabilities. Behavior Research Methods, Instruments, & Computers, 31(4), 701–705. doi:10.3758/BF03200749
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> freq = [3, 6, 2, 9] >>> prob = [0.2, 0.3, 0.1, 0.4] >>> di_mpmf(freq, prob) 0.011863293601775638
Expand source code
def di_mpmf(F, P, method='loggamma'): ''' Multinomial Probability Mass Function ----------------------------- This is a function for the multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P. It is a generalization of the binomial distribution. The distribution is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/Multinomial.html) Parameters ---------- F : list the observed counts P : list the probabilities for each category method : {"loggamma", "factorial", "gamma", "mprob"}, optional calculation method to use Returns ------- mpmf : the probability Notes ----- If *method=factorial* the following formula is used: $$mpmf\\left(F, P\\right) = \\frac{n!}{\\prod_{i=1}^{k} \\left(F_i!\\right)} \\times \\prod_{i=1}^{k} P_i^{F_i}$$ This formula was most likely already used by for example Edgeworth (1905), but can for example also be found in Berry and Mielke (1995, p. 769). If *method=gamma*: $$mpmf\\left(F, P\\right) = \\frac{\\Gamma\\left(1+n\\right)}{\\prod_{i=1}^n \\Gamma\\left(1+F_i\\right)} \\times \\prod_{i=1}^{k} P_i^{F_i}$$ If *method=loggamma*: $$mpmf\\left(F, P\\right) = e^{\\ln\\left(mpmf\\left(F, P\\right)\\right)}$$ $$\\ln\\left(mpmf\\left(F, P\\right)\\right) = \\ln\\left(\\Gamma\\left(n + 1\\right)\\right) + \\sum_{i=1}^k F_i\\times\\ln\\left(P_i\\right) - \\ln\\left(\\Gamma\\left(F_i + 1\\right)\\right)$$ This formula can for example be found in Arnold (2018). If *method=mprob* the algorithm from García-Pérez (1999) is used: 1. Determine $F^{\\*}$, the counts in descending order, and move the elements in $P$ accordingly creating $P^{\\*}$. 2. Set $pmf = 1$, $t=P_1{\\*}$, $i=2$, $x=0$, and $m=F_1^{\\*}$ 3. Set $l = F_i^{\\*}$. For $r=1$ to $l$ do: * update $x = x + 1$ * if $x > F_1^{\\*}$ then set $t = 1$ (else nothing) * update $pmf = pmf\\times t\\times P_i^{\\*}\\times \\frac{r + m}{r}$ 4. If $i=k$, then go to step 5, otherwise update $i = i + 1$, $m = m + F_i^{\\*}$ and go to step 3 5. If $x < F_1^{\\*}$ then for $r=x + 1$ to $F_1^{\\*}$ update $pmf = pmf\times P_1^{\\*}$ This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for this, but it uses the multinomial function from the scipy.stats library. References ---------- Arnold, J. (2018, December 3). Maximum Likelihood for the multinomial distribution (bag of words) [Blog]. Jakuba. https://blog.jakuba.net/maximum-likelihood-for-multinomial-distribution/ Berry, K. J., & Mielke, P. W. (1995). Exact cumulative probabilities for the multinomial distribution. *Educational and Psychological Measurement, 55*(5), 769–772. doi:10.1177/0013164495055005008 Edgeworth, F. Y. (1905). The law of error. *Transactions of the Cambridge Philosophical Society, 20*, 36–66. García-Pérez, M. A. (1999). MPROB: Computation of multinomial probabilities. *Behavior Research Methods, Instruments, & Computers, 31*(4), 701–705. doi:10.3758/BF03200749 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> freq = [3, 6, 2, 9] >>> prob = [0.2, 0.3, 0.1, 0.4] >>> di_mpmf(freq, prob) 0.011863293601775638 ''' k = len(F) n = sum(F) if method=='factorial': mpmf = factorial(n)/prod([factorial(i) for i in F]) * prod([P[i]**F[i] for i in range(k)]) elif method=='gamma': mpmf = gamma(1 + n)/prod([gamma(F[i]+1) for i in range(k)]) * prod([P[i]**F[i] for i in range(k)]) elif method=='loggamma': mpmf = exp(lgamma(n+1) + sum([F[i]*log(P[i]) - lgamma(F[i]+1) for i in range(k)])) elif method=='mprob': #combine F and p pair = list(zip(F, P)) #sort on F in descending order sort_pair = sorted(pair, key=lambda x: x[0], reverse=True) #split back F_s, P_s = zip(*sort_pair) #convert to lists again F_s = list(F_s) P_s = list(P_s) mpmf = 1 t = P_s[0] i = 2 x = 0 m = F_s[0] #loop from 2 to k for i in range(2, k+1): #set l l = F_s[i-1] #for r from 1 to l for r in range(1, l+1): #update x x = x + 1 #check x greater than F_0^* if x > F_s[0]: t = 1 #update pmf mpmf = mpmf*t*P_s[i-1]*(r+m)/r #update m m = m + F_s[i-1] if x < F_s[0]: for r in range(x+1, F_s[0]+1): mpmf = mpmf*P_s[0] return mpmf
def he_find_combinations(n, k)
-
Find Combinations
Helper function for the multinomial cumulative distribution. Will return all possible combinations to distribute n items over k categories.
Parameters
n
:int
- the sample size
k
:int
- the number of categories
Returns
result
:list where each element is a list showing a unique combination
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def he_find_combinations(n, k): ''' Find Combinations ----------------------------- Helper function for the multinomial cumulative distribution. Will return all possible combinations to distribute n items over k categories. Parameters ---------- n : int the sample size k : int the number of categories Returns ------- result : list where each element is a list showing a unique combination Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' def helper(n, k, current_combination, result): if k == 0: if n == 0: result.append(current_combination) return for i in range(n + 1): helper(n - i, k - 1, current_combination + [i], result) result = [] helper(n, k, [], result) return result