Module stikpetP.tests.test_trinomial_ps

Expand source code
import pandas as pd
from scipy.stats import binom
from statistics import NormalDist
from ..other.table_cross import tab_cross
from ..distributions.dist_multinomial import di_mpmf

def ts_trinomial_ps(field1, field2, levels=None, dmu=0):
    '''
    Trinomial Test (Paired Samples)
    -------------------------------
    A similar test as the sign test, but also includes the pairs that are tied.
    
    Parameters
    ----------
    field1 : pandas series
        the ordinal or scale scores of the first variable
    field2 : pandas series
        the ordinal or scale scores of the second variable
    levels : list or dictionary, optional
        the categories to use
    dmu : float, optional
        the hypothesized difference between each pair. Default is 0.
        
    Returns
    -------
    Dataframe with:
    
    * *n pos*, the number of pairs with a difference above dmu
    * *n neg*, the number of pairs with a difference below dmu
    * *n 0*, the number of pairs with a difference equal to dmu
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used (Bian et al., 2009, pp. 5-6):
    $$sig. = 2\\times\\text{TRI}\\left(\\left(n_{pos},n_{neg},n_{0}\\right),\\left(p_{pos},p_{neg},p_{0}\\right)\\right)$$
    
    With:
    $$n_{pos}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i>d_{H0} \\\\ 0 & \\text{ if } d_i\\leq d_{H0} \\end{cases}$$
    $$n_{neg}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i\\geq d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$
    $$n_{0}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i = d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$
    $$d_i = x_i - y_i$$
    $$p_0 = \\frac{n_0}{n}$$
    $$p_{pos} = p_{neg} = \\frac{1-p_0}{2}$$
    
    The cumulative mass function of the trinomial distribution is then calculated using:
    $$\\text{TRI} \\left(\\left(n_{pos},n_{neg},n_{0}\\right), \\left(p_{pos},p_{neg},p_{0}\\right)\\right) = \\sum_{i=n_d}^n \\sum_{j=0}^{\lfloor \\frac{n-i}{2}\\rfloor} \\text{tri}\\left(\\left(j, j+i, n-j-\\left(j+i\\right)\\right), \\left(p_{pos}, p_{neg}, p_0\\right)\\right)$$
    
    The probability mass function of the trinomial distribution is then determined using the multinomial pmf function di_mpmf(). Which for the trinomial distribution becomes:
    $$\\text{tri} \\left(\\left(a,b,c\\right), \\left(p_{pos},p_{neg},p_{0}\\right)\\right) = \\frac{n!}{a!\\times b!\\times c!}\\times p_{pos}^a\\times p_{neb}^b\\times p_0^c$$
    
    With:
    $$n_d = \\left|n_{pos} - n_{neg}\\right|$$
    
    * \\(n\\), is the number of cases unequal to a difference of 0
    * \\(n_{pos}\\), the number of pairs with a difference greater than the null hypothesis
    * \\(n_{neg}\\), the number of pairs with a difference greater than the null hypothesis
    * \\(n_{0}\\), the number of pairs with a difference equal to the null hypothesis
    * \\(d_{H0}\\), the difference according to the null hypothesis, usually 0
    * \\(x_i\\), is the i-th score from the first variable
    * \\(y_i\\), is the i-th score from the second variable
    
    References
    ----------
    Bian, G., McAleer, M., & Wong, W.-K. (2009). A trinomial test for paired data when there are many ties. *SSRN Electronic Journal*. doi:10.2139/ssrn.1410589
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076   
    
    '''
    
    ct = tab_cross(field1, field2, order1=levels, order2=levels)
    k1 = ct.shape[0]
    k2 = ct.shape[1]
    
    if levels is not None:
        #replace row labels with numeric score
        ct = ct.reset_index(drop=True) 
        ct.columns = [i for i in range(0, k2)]
    
    npos = 0
    nneg = 0
    n0 = 0
    for i in range(0, k1):
        for j in range(0, k2):
            if ct.index[i] - ct.columns[j] > dmu:
                npos = npos + ct.iloc[i, j]
            elif ct.index[i] - ct.columns[j] < dmu:
                nneg = nneg + ct.iloc[i, j]
            else:
                n0 = n0 + ct.iloc[i, j]
                
    nc = nneg + npos
    nMin = npos
    nMax = nneg
    
    n = npos + nneg + n0
    nd = abs(npos - nneg)
    p0 = n0 / n
    ppos = (1 - p0) / 2
    pneg = ppos
    probs = [ppos, pneg, p0]
    
    ns = [0]*3
    sig = 0
    for z in range(nd, n+1):
        for j in range(0, int((n - z)/2)+1):
            ns[0] = j
            ns[1] = j + z
            ns[2] = n - j - (j + z)
            sig = sig + di_mpmf(ns, probs)
    
    if sig < 0.5:
        pVal = sig * 2
    else:
        pVal = 2 * (1 - pVal)
    
    res = pd.DataFrame([[npos, nneg, n0, pVal]])
    res.columns = ["n pos", "n neg", "n 0", "p-value"]
    
    return res

Functions

def ts_trinomial_ps(field1, field2, levels=None, dmu=0)

Trinomial Test (Paired Samples)

A similar test as the sign test, but also includes the pairs that are tied.

Parameters

field1 : pandas series
the ordinal or scale scores of the first variable
field2 : pandas series
the ordinal or scale scores of the second variable
levels : list or dictionary, optional
the categories to use
dmu : float, optional
the hypothesized difference between each pair. Default is 0.

Returns

Dataframe with:
 
  • n pos, the number of pairs with a difference above dmu
  • n neg, the number of pairs with a difference below dmu
  • n 0, the number of pairs with a difference equal to dmu
  • p-value, the p-value (significance)

Notes

The formula used (Bian et al., 2009, pp. 5-6): sig. = 2\times\text{TRI}\left(\left(n_{pos},n_{neg},n_{0}\right),\left(p_{pos},p_{neg},p_{0}\right)\right)

With: n_{pos}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i>d_{H0} \\ 0 & \text{ if } d_i\leq d_{H0} \end{cases} n_{neg}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i\geq d_{H0} \\ 0 & \text{ if } d_i<d_{H0} \end{cases} n_{0}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i = d_{H0} \\ 0 & \text{ if } d_i<d_{H0} \end{cases} d_i = x_i - y_i p_0 = \frac{n_0}{n} p_{pos} = p_{neg} = \frac{1-p_0}{2}

The cumulative mass function of the trinomial distribution is then calculated using: \text{TRI} \left(\left(n_{pos},n_{neg},n_{0}\right), \left(p_{pos},p_{neg},p_{0}\right)\right) = \sum_{i=n_d}^n \sum_{j=0}^{\lfloor \frac{n-i}{2}\rfloor} \text{tri}\left(\left(j, j+i, n-j-\left(j+i\right)\right), \left(p_{pos}, p_{neg}, p_0\right)\right)

The probability mass function of the trinomial distribution is then determined using the multinomial pmf function di_mpmf(). Which for the trinomial distribution becomes: \text{tri} \left(\left(a,b,c\right), \left(p_{pos},p_{neg},p_{0}\right)\right) = \frac{n!}{a!\times b!\times c!}\times p_{pos}^a\times p_{neb}^b\times p_0^c

With: n_d = \left|n_{pos} - n_{neg}\right|

  • n, is the number of cases unequal to a difference of 0
  • n_{pos}, the number of pairs with a difference greater than the null hypothesis
  • n_{neg}, the number of pairs with a difference greater than the null hypothesis
  • n_{0}, the number of pairs with a difference equal to the null hypothesis
  • d_{H0}, the difference according to the null hypothesis, usually 0
  • x_i, is the i-th score from the first variable
  • y_i, is the i-th score from the second variable

References

Bian, G., McAleer, M., & Wong, W.-K. (2009). A trinomial test for paired data when there are many ties. SSRN Electronic Journal. doi:10.2139/ssrn.1410589

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def ts_trinomial_ps(field1, field2, levels=None, dmu=0):
    '''
    Trinomial Test (Paired Samples)
    -------------------------------
    A similar test as the sign test, but also includes the pairs that are tied.
    
    Parameters
    ----------
    field1 : pandas series
        the ordinal or scale scores of the first variable
    field2 : pandas series
        the ordinal or scale scores of the second variable
    levels : list or dictionary, optional
        the categories to use
    dmu : float, optional
        the hypothesized difference between each pair. Default is 0.
        
    Returns
    -------
    Dataframe with:
    
    * *n pos*, the number of pairs with a difference above dmu
    * *n neg*, the number of pairs with a difference below dmu
    * *n 0*, the number of pairs with a difference equal to dmu
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used (Bian et al., 2009, pp. 5-6):
    $$sig. = 2\\times\\text{TRI}\\left(\\left(n_{pos},n_{neg},n_{0}\\right),\\left(p_{pos},p_{neg},p_{0}\\right)\\right)$$
    
    With:
    $$n_{pos}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i>d_{H0} \\\\ 0 & \\text{ if } d_i\\leq d_{H0} \\end{cases}$$
    $$n_{neg}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i\\geq d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$
    $$n_{0}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i = d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$
    $$d_i = x_i - y_i$$
    $$p_0 = \\frac{n_0}{n}$$
    $$p_{pos} = p_{neg} = \\frac{1-p_0}{2}$$
    
    The cumulative mass function of the trinomial distribution is then calculated using:
    $$\\text{TRI} \\left(\\left(n_{pos},n_{neg},n_{0}\\right), \\left(p_{pos},p_{neg},p_{0}\\right)\\right) = \\sum_{i=n_d}^n \\sum_{j=0}^{\lfloor \\frac{n-i}{2}\\rfloor} \\text{tri}\\left(\\left(j, j+i, n-j-\\left(j+i\\right)\\right), \\left(p_{pos}, p_{neg}, p_0\\right)\\right)$$
    
    The probability mass function of the trinomial distribution is then determined using the multinomial pmf function di_mpmf(). Which for the trinomial distribution becomes:
    $$\\text{tri} \\left(\\left(a,b,c\\right), \\left(p_{pos},p_{neg},p_{0}\\right)\\right) = \\frac{n!}{a!\\times b!\\times c!}\\times p_{pos}^a\\times p_{neb}^b\\times p_0^c$$
    
    With:
    $$n_d = \\left|n_{pos} - n_{neg}\\right|$$
    
    * \\(n\\), is the number of cases unequal to a difference of 0
    * \\(n_{pos}\\), the number of pairs with a difference greater than the null hypothesis
    * \\(n_{neg}\\), the number of pairs with a difference greater than the null hypothesis
    * \\(n_{0}\\), the number of pairs with a difference equal to the null hypothesis
    * \\(d_{H0}\\), the difference according to the null hypothesis, usually 0
    * \\(x_i\\), is the i-th score from the first variable
    * \\(y_i\\), is the i-th score from the second variable
    
    References
    ----------
    Bian, G., McAleer, M., & Wong, W.-K. (2009). A trinomial test for paired data when there are many ties. *SSRN Electronic Journal*. doi:10.2139/ssrn.1410589
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076   
    
    '''
    
    ct = tab_cross(field1, field2, order1=levels, order2=levels)
    k1 = ct.shape[0]
    k2 = ct.shape[1]
    
    if levels is not None:
        #replace row labels with numeric score
        ct = ct.reset_index(drop=True) 
        ct.columns = [i for i in range(0, k2)]
    
    npos = 0
    nneg = 0
    n0 = 0
    for i in range(0, k1):
        for j in range(0, k2):
            if ct.index[i] - ct.columns[j] > dmu:
                npos = npos + ct.iloc[i, j]
            elif ct.index[i] - ct.columns[j] < dmu:
                nneg = nneg + ct.iloc[i, j]
            else:
                n0 = n0 + ct.iloc[i, j]
                
    nc = nneg + npos
    nMin = npos
    nMax = nneg
    
    n = npos + nneg + n0
    nd = abs(npos - nneg)
    p0 = n0 / n
    ppos = (1 - p0) / 2
    pneg = ppos
    probs = [ppos, pneg, p0]
    
    ns = [0]*3
    sig = 0
    for z in range(nd, n+1):
        for j in range(0, int((n - z)/2)+1):
            ns[0] = j
            ns[1] = j + z
            ns[2] = n - j - (j + z)
            sig = sig + di_mpmf(ns, probs)
    
    if sig < 0.5:
        pVal = sig * 2
    else:
        pVal = 2 * (1 - pVal)
    
    res = pd.DataFrame([[npos, nneg, n0, pVal]])
    res.columns = ["n pos", "n neg", "n 0", "p-value"]
    
    return res