Module `stikpetP.distributions.dist_multinomial`

Expand source code

from math import gamma, prod, factorial, lgamma, log, exp

def di_mpmf(F, P, method='loggamma'):
    '''
    Multinomial Probability Mass Function
    -----------------------------
     
    This is a function for the multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P. It is a generalization of the binomial distribution.

    The distribution is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/Multinomial.html)
    
    Parameters
    ----------
    F : list
        the observed counts
    P : list
        the probabilities for each category
    method : {"loggamma", "factorial", "gamma", "mprob"}, optional
        calculation method to use
    
    Returns
    -------
    mpmf : the probability
    
    Notes
    -----
    If *method=factorial* the following formula is used:    
    $$mpmf\\left(F, P\\right) = \\frac{n!}{\\prod_{i=1}^{k} \\left(F_i!\\right)} \\times \\prod_{i=1}^{k} P_i^{F_i}$$

    This formula was most likely already used by for example Edgeworth (1905), but can for example also be found in Berry and Mielke (1995, p. 769).

    If *method=gamma*:
    $$mpmf\\left(F, P\\right) = \\frac{\\Gamma\\left(1+n\\right)}{\\prod_{i=1}^n \\Gamma\\left(1+F_i\\right)} \\times \\prod_{i=1}^{k} P_i^{F_i}$$

    If *method=loggamma*:
    $$mpmf\\left(F, P\\right) = e^{\\ln\\left(mpmf\\left(F, P\\right)\\right)}$$
    $$\\ln\\left(mpmf\\left(F, P\\right)\\right) = \\ln\\left(\\Gamma\\left(n + 1\\right)\\right) + \\sum_{i=1}^k F_i\\times\\ln\\left(P_i\\right) - \\ln\\left(\\Gamma\\left(F_i + 1\\right)\\right)$$

    This formula can for example be found in Arnold (2018).

    If *method=mprob* the algorithm from García-Pérez (1999) is used:
    
    1. Determine $F^{\\*}$, the counts in descending order, and move the elements in $P$ accordingly creating $P^{\\*}$.
    2. Set $pmf = 1$, $t=P_1{\\*}$, $i=2$, $x=0$, and $m=F_1^{\\*}$
    3. Set $l = F_i^{\\*}$. For $r=1$ to $l$ do:
         * update $x = x + 1$
         * if $x > F_1^{\\*}$ then set $t = 1$ (else nothing)
         * update $pmf = pmf\\times t\\times P_i^{\\*}\\times \\frac{r + m}{r}$
    4. If $i=k$, then go to step 5, otherwise update $i = i + 1$, $m = m + F_i^{\\*}$ and go to step 3
    5. If $x < F_1^{\\*}$ then for $r=x + 1$ to $F_1^{\\*}$ update $pmf = pmf\times P_1^{\\*}$

    This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for this, but it uses the multinomial function from the scipy.stats library.
    
    References
    ----------
    Arnold, J. (2018, December 3). Maximum Likelihood for the multinomial distribution (bag of words) [Blog]. Jakuba. https://blog.jakuba.net/maximum-likelihood-for-multinomial-distribution/
    
    Berry, K. J., & Mielke, P. W. (1995). Exact cumulative probabilities for the multinomial distribution. *Educational and Psychological Measurement, 55*(5), 769–772. doi:10.1177/0013164495055005008
    
    Edgeworth, F. Y. (1905). The law of error. *Transactions of the Cambridge Philosophical Society, 20*, 36–66.
    
    García-Pérez, M. A. (1999). MPROB: Computation of multinomial probabilities. *Behavior Research Methods, Instruments, & Computers, 31*(4), 701–705. doi:10.3758/BF03200749
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> freq = [3, 6, 2, 9]
    >>> prob = [0.2, 0.3, 0.1, 0.4]
    >>> di_mpmf(freq, prob)
    0.011863293601775638
    
    '''
    k = len(F)
    n = sum(F)

    if method=='factorial':
        mpmf = factorial(n)/prod([factorial(i) for i in F]) * prod([P[i]**F[i] for i in range(k)])    
    elif method=='gamma':
        mpmf = gamma(1 + n)/prod([gamma(F[i]+1) for i in range(k)]) * prod([P[i]**F[i] for i in range(k)])

    elif method=='loggamma':
        mpmf = exp(lgamma(n+1) + sum([F[i]*log(P[i]) - lgamma(F[i]+1) for i in range(k)]))
    
    elif method=='mprob':
    
        #combine F and p
        pair = list(zip(F, P))
        
        #sort on F in descending order
        sort_pair = sorted(pair, key=lambda x: x[0], reverse=True)
        
        #split back
        F_s, P_s = zip(*sort_pair)
        
        #convert to lists again
        F_s = list(F_s)
        P_s = list(P_s)
        
        mpmf = 1
        t = P_s[0]
        i = 2
        x = 0
        m = F_s[0]    
        
        #loop from 2 to k
        for i in range(2, k+1):
            #set l
            l = F_s[i-1]
            #for r from 1 to l
            for r in range(1, l+1):
                #update x
                x = x + 1
        
                #check x greater than F_0^*
                if x > F_s[0]:
                    t = 1
                    
                #update pmf
                mpmf = mpmf*t*P_s[i-1]*(r+m)/r
        
            #update m
            m = m + F_s[i-1]     
        
        if x < F_s[0]:
            for r in range(x+1, F_s[0]+1):
                mpmf = mpmf*P_s[0]

    return mpmf

def di_mcdf(F, P, method='loggamma'):
    '''
    Multinomial Cumulative Distribution Function
    -----------------------------
     
    This is a function for the cumulative multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P, or a distribution even more rare. It is a generalization of the binomial distribution.

    The distribution is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/Multinomial.html)
    
    Parameters
    ----------
    F : list
        the observed counts
    P : list
        the probabilities for each category
    method : {"loggamma", "factorial", "gamma"}, optional
        calculation method to use
    
    Returns
    -------
    mcdf : the probability
    
    Notes
    -----
    The function first determines all possible arrangements over k categories that sum to n, using the **find_combinations()** function. It then uses the **di_pmf()** function to determine the probability for each of these, and sums those that are less or equal to the sample version.

    This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for this, but it uses the multinomial function from the scipy.stats library.

    See Also
    --------
    stikpetP.distributions.dist_multinomial.di_mpmf : has details on the calculations for the different methods.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> freq = [3, 6, 2, 9]
    >>> prob = [0.2, 0.3, 0.1, 0.4]
    >>> di_mcdf(freq, prob)
    0.9747905010962287
    
    '''
    n = sum(F)
    k = len(F)
    mpmf = di_mpmf(F, P, method)
    
    mcdf = 0
    combinations = he_find_combinations(n,k)
    for i in combinations:
        pmf = di_mpmf(i, P, method)
        if pmf <= mpmf:
            mcdf = mcdf + pmf
    
    return mcdf

def he_find_combinations(n, k):
    '''
    Find Combinations
    -----------------------------
    Helper function for the multinomial cumulative distribution. Will return all possible combinations to distribute n items over k categories.
    
    Parameters
    ----------
    n : int
        the sample size
    k : int
        the number of categories
    
    Returns
    -------
    result : list where each element is a list showing a unique combination
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    def helper(n, k, current_combination, result):
        if k == 0:
            if n == 0:
                result.append(current_combination)
            return
        for i in range(n + 1):
            helper(n - i, k - 1, current_combination + [i], result)

    result = []
    helper(n, k, [], result)
    return result

Functions

def di_mcdf(F, P, method='loggamma')

Multinomial Cumulative Distribution Function

This is a function for the cumulative multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P, or a distribution even more rare. It is a generalization of the binomial distribution.

The distribution is also described at PeterStatistics.com

Parameters

F : list: the observed counts
P : list: the probabilities for each category
method : {"loggamma", "factorial", "gamma"}, optional: calculation method to use

Returns

mcdf : the probability

Notes

The function first determines all possible arrangements over k categories that sum to n, using the find_combinations() function. It then uses the di_pmf() function to determine the probability for each of these, and sums those that are less or equal to the sample version.

This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function ts_multinomial_gof for this, but it uses the multinomial function from the scipy.stats library.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

>>> freq = [3, 6, 2, 9]
>>> prob = [0.2, 0.3, 0.1, 0.4]
>>> di_mcdf(freq, prob)
0.9747905010962287

Expand source code

def di_mcdf(F, P, method='loggamma'):
    '''
    Multinomial Cumulative Distribution Function
    -----------------------------
     
    This is a function for the cumulative multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P, or a distribution even more rare. It is a generalization of the binomial distribution.

    The distribution is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/Multinomial.html)
    
    Parameters
    ----------
    F : list
        the observed counts
    P : list
        the probabilities for each category
    method : {"loggamma", "factorial", "gamma"}, optional
        calculation method to use
    
    Returns
    -------
    mcdf : the probability
    
    Notes
    -----
    The function first determines all possible arrangements over k categories that sum to n, using the **find_combinations()** function. It then uses the **di_pmf()** function to determine the probability for each of these, and sums those that are less or equal to the sample version.

    This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for this, but it uses the multinomial function from the scipy.stats library.

    See Also
    --------
    stikpetP.distributions.dist_multinomial.di_mpmf : has details on the calculations for the different methods.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> freq = [3, 6, 2, 9]
    >>> prob = [0.2, 0.3, 0.1, 0.4]
    >>> di_mcdf(freq, prob)
    0.9747905010962287
    
    '''
    n = sum(F)
    k = len(F)
    mpmf = di_mpmf(F, P, method)
    
    mcdf = 0
    combinations = he_find_combinations(n,k)
    for i in combinations:
        pmf = di_mpmf(i, P, method)
        if pmf <= mpmf:
            mcdf = mcdf + pmf
    
    return mcdf

def di_mpmf(F, P, method='loggamma')

Multinomial Probability Mass Function

This is a function for the multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P. It is a generalization of the binomial distribution.

The distribution is also described at PeterStatistics.com

Parameters

F : list: the observed counts
P : list: the probabilities for each category
method : {"loggamma", "factorial", "gamma", "mprob"}, optional: calculation method to use

Returns

mpmf : the probability

Notes

If method=factorial the following formula is used:
$mpmf\left(F, P\right) = \frac{n!}{\prod_{i=1}^{k} \left(F_i!\right)} \times \prod_{i=1}^{k} P_i^{F_i}$

This formula was most likely already used by for example Edgeworth (1905), but can for example also be found in Berry and Mielke (1995, p. 769).

If method=gamma: $mpmf\left(F, P\right) = \frac{\Gamma\left(1+n\right)}{\prod_{i=1}^n \Gamma\left(1+F_i\right)} \times \prod_{i=1}^{k} P_i^{F_i}$

If method=loggamma: $mpmf\left(F, P\right) = e^{\ln\left(mpmf\left(F, P\right)\right)}$ $\ln\left(mpmf\left(F, P\right)\right) = \ln\left(\Gamma\left(n + 1\right)\right) + \sum_{i=1}^k F_i\times\ln\left(P_i\right) - \ln\left(\Gamma\left(F_i + 1\right)\right)$

This formula can for example be found in Arnold (2018).

If method=mprob the algorithm from García-Pérez (1999) is used:

Determine $F^{*}$, the counts in descending order, and move the elements in $P$ accordingly creating $P^{*}$.
Set $pmf = 1$, $t=P_1{*}$, $i=2$, $x=0$, and $m=F_1^{*}$
Set $l = F_i^{*}$. For $r=1$ to $l$ do:
- update $x = x + 1$
- if $x > F_1^{*}$ then set $t = 1$ (else nothing)
- update $pmf = pmf\times t\times P_i^{*}\times \frac{r + m}{r}$
If $i=k$, then go to step 5, otherwise update $i = i + 1$, $m = m + F_i^{*}$ and go to step 3
If $x < F_1^{*}$ then for $r=x + 1$ to $F_1^{*}$ update $pmf = pmf imes P_1^{*}$

This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function ts_multinomial_gof for this, but it uses the multinomial function from the scipy.stats library.

References

Arnold, J. (2018, December 3). Maximum Likelihood for the multinomial distribution (bag of words) [Blog]. Jakuba. https://blog.jakuba.net/maximum-likelihood-for-multinomial-distribution/

Berry, K. J., & Mielke, P. W. (1995). Exact cumulative probabilities for the multinomial distribution. Educational and Psychological Measurement, 55(5), 769–772. doi:10.1177/0013164495055005008

Edgeworth, F. Y. (1905). The law of error. Transactions of the Cambridge Philosophical Society, 20, 36–66.

García-Pérez, M. A. (1999). MPROB: Computation of multinomial probabilities. Behavior Research Methods, Instruments, & Computers, 31(4), 701–705. doi:10.3758/BF03200749

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

>>> freq = [3, 6, 2, 9]
>>> prob = [0.2, 0.3, 0.1, 0.4]
>>> di_mpmf(freq, prob)
0.011863293601775638

Expand source code

def di_mpmf(F, P, method='loggamma'):
    '''
    Multinomial Probability Mass Function
    -----------------------------
     
    This is a function for the multinomial probability. It returns the probability of a distribution as given in F for a sample size of sum of F, where the probability for each category is given as in P. It is a generalization of the binomial distribution.

    The distribution is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Distributions/Multinomial.html)
    
    Parameters
    ----------
    F : list
        the observed counts
    P : list
        the probabilities for each category
    method : {"loggamma", "factorial", "gamma", "mprob"}, optional
        calculation method to use
    
    Returns
    -------
    mpmf : the probability
    
    Notes
    -----
    If *method=factorial* the following formula is used:    
    $$mpmf\\left(F, P\\right) = \\frac{n!}{\\prod_{i=1}^{k} \\left(F_i!\\right)} \\times \\prod_{i=1}^{k} P_i^{F_i}$$

    This formula was most likely already used by for example Edgeworth (1905), but can for example also be found in Berry and Mielke (1995, p. 769).

    If *method=gamma*:
    $$mpmf\\left(F, P\\right) = \\frac{\\Gamma\\left(1+n\\right)}{\\prod_{i=1}^n \\Gamma\\left(1+F_i\\right)} \\times \\prod_{i=1}^{k} P_i^{F_i}$$

    If *method=loggamma*:
    $$mpmf\\left(F, P\\right) = e^{\\ln\\left(mpmf\\left(F, P\\right)\\right)}$$
    $$\\ln\\left(mpmf\\left(F, P\\right)\\right) = \\ln\\left(\\Gamma\\left(n + 1\\right)\\right) + \\sum_{i=1}^k F_i\\times\\ln\\left(P_i\\right) - \\ln\\left(\\Gamma\\left(F_i + 1\\right)\\right)$$

    This formula can for example be found in Arnold (2018).

    If *method=mprob* the algorithm from García-Pérez (1999) is used:
    
    1. Determine $F^{\\*}$, the counts in descending order, and move the elements in $P$ accordingly creating $P^{\\*}$.
    2. Set $pmf = 1$, $t=P_1{\\*}$, $i=2$, $x=0$, and $m=F_1^{\\*}$
    3. Set $l = F_i^{\\*}$. For $r=1$ to $l$ do:
         * update $x = x + 1$
         * if $x > F_1^{\\*}$ then set $t = 1$ (else nothing)
         * update $pmf = pmf\\times t\\times P_i^{\\*}\\times \\frac{r + m}{r}$
    4. If $i=k$, then go to step 5, otherwise update $i = i + 1$, $m = m + F_i^{\\*}$ and go to step 3
    5. If $x < F_1^{\\*}$ then for $r=x + 1$ to $F_1^{\\*}$ update $pmf = pmf\times P_1^{\\*}$

    This distribution is used in a Multinomial Goodness-of-Fit Test. The stikpetP library has a function [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for this, but it uses the multinomial function from the scipy.stats library.
    
    References
    ----------
    Arnold, J. (2018, December 3). Maximum Likelihood for the multinomial distribution (bag of words) [Blog]. Jakuba. https://blog.jakuba.net/maximum-likelihood-for-multinomial-distribution/
    
    Berry, K. J., & Mielke, P. W. (1995). Exact cumulative probabilities for the multinomial distribution. *Educational and Psychological Measurement, 55*(5), 769–772. doi:10.1177/0013164495055005008
    
    Edgeworth, F. Y. (1905). The law of error. *Transactions of the Cambridge Philosophical Society, 20*, 36–66.
    
    García-Pérez, M. A. (1999). MPROB: Computation of multinomial probabilities. *Behavior Research Methods, Instruments, & Computers, 31*(4), 701–705. doi:10.3758/BF03200749
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    >>> freq = [3, 6, 2, 9]
    >>> prob = [0.2, 0.3, 0.1, 0.4]
    >>> di_mpmf(freq, prob)
    0.011863293601775638
    
    '''
    k = len(F)
    n = sum(F)

    if method=='factorial':
        mpmf = factorial(n)/prod([factorial(i) for i in F]) * prod([P[i]**F[i] for i in range(k)])    
    elif method=='gamma':
        mpmf = gamma(1 + n)/prod([gamma(F[i]+1) for i in range(k)]) * prod([P[i]**F[i] for i in range(k)])

    elif method=='loggamma':
        mpmf = exp(lgamma(n+1) + sum([F[i]*log(P[i]) - lgamma(F[i]+1) for i in range(k)]))
    
    elif method=='mprob':
    
        #combine F and p
        pair = list(zip(F, P))
        
        #sort on F in descending order
        sort_pair = sorted(pair, key=lambda x: x[0], reverse=True)
        
        #split back
        F_s, P_s = zip(*sort_pair)
        
        #convert to lists again
        F_s = list(F_s)
        P_s = list(P_s)
        
        mpmf = 1
        t = P_s[0]
        i = 2
        x = 0
        m = F_s[0]    
        
        #loop from 2 to k
        for i in range(2, k+1):
            #set l
            l = F_s[i-1]
            #for r from 1 to l
            for r in range(1, l+1):
                #update x
                x = x + 1
        
                #check x greater than F_0^*
                if x > F_s[0]:
                    t = 1
                    
                #update pmf
                mpmf = mpmf*t*P_s[i-1]*(r+m)/r
        
            #update m
            m = m + F_s[i-1]     
        
        if x < F_s[0]:
            for r in range(x+1, F_s[0]+1):
                mpmf = mpmf*P_s[0]

    return mpmf

def he_find_combinations(n, k)

Find Combinations

Helper function for the multinomial cumulative distribution. Will return all possible combinations to distribute n items over k categories.

Parameters

n : int: the sample size
k : int: the number of categories

Returns

result : list where each element is a list showing a unique combination

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code

def he_find_combinations(n, k):
    '''
    Find Combinations
    -----------------------------
    Helper function for the multinomial cumulative distribution. Will return all possible combinations to distribute n items over k categories.
    
    Parameters
    ----------
    n : int
        the sample size
    k : int
        the number of categories
    
    Returns
    -------
    result : list where each element is a list showing a unique combination
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    def helper(n, k, current_combination, result):
        if k == 0:
            if n == 0:
                result.append(current_combination)
            return
        for i in range(n + 1):
            helper(n - i, k - 1, current_combination + [i], result)

    result = []
    helper(n, k, [], result)
    return result