Module stikpetP.tests.test_trinomial_ps
Expand source code
import pandas as pd
from scipy.stats import binom
from statistics import NormalDist
from ..other.table_cross import tab_cross
from ..distributions.dist_multinomial import di_mpmf
def ts_trinomial_ps(field1, field2, levels=None, dmu=0):
'''
Trinomial Test (Paired Samples)
-------------------------------
A similar test as the sign test, but also includes the pairs that are tied.
Parameters
----------
field1 : pandas series
the ordinal or scale scores of the first variable
field2 : pandas series
the ordinal or scale scores of the second variable
levels : list or dictionary, optional
the categories to use
dmu : float, optional
the hypothesized difference between each pair. Default is 0.
Returns
-------
Dataframe with:
* *n pos*, the number of pairs with a difference above dmu
* *n neg*, the number of pairs with a difference below dmu
* *n 0*, the number of pairs with a difference equal to dmu
* *p-value*, the p-value (significance)
Notes
-----
The formula used (Bian et al., 2009, pp. 5-6):
$$sig. = 2\\times\\text{TRI}\\left(\\left(n_{pos},n_{neg},n_{0}\\right),\\left(p_{pos},p_{neg},p_{0}\\right)\\right)$$
With:
$$n_{pos}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i>d_{H0} \\\\ 0 & \\text{ if } d_i\\leq d_{H0} \\end{cases}$$
$$n_{neg}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i\\geq d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$
$$n_{0}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i = d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$
$$d_i = x_i - y_i$$
$$p_0 = \\frac{n_0}{n}$$
$$p_{pos} = p_{neg} = \\frac{1-p_0}{2}$$
The cumulative mass function of the trinomial distribution is then calculated using:
$$\\text{TRI} \\left(\\left(n_{pos},n_{neg},n_{0}\\right), \\left(p_{pos},p_{neg},p_{0}\\right)\\right) = \\sum_{i=n_d}^n \\sum_{j=0}^{\lfloor \\frac{n-i}{2}\\rfloor} \\text{tri}\\left(\\left(j, j+i, n-j-\\left(j+i\\right)\\right), \\left(p_{pos}, p_{neg}, p_0\\right)\\right)$$
The probability mass function of the trinomial distribution is then determined using the multinomial pmf function di_mpmf(). Which for the trinomial distribution becomes:
$$\\text{tri} \\left(\\left(a,b,c\\right), \\left(p_{pos},p_{neg},p_{0}\\right)\\right) = \\frac{n!}{a!\\times b!\\times c!}\\times p_{pos}^a\\times p_{neb}^b\\times p_0^c$$
With:
$$n_d = \\left|n_{pos} - n_{neg}\\right|$$
* \\(n\\), is the number of cases unequal to a difference of 0
* \\(n_{pos}\\), the number of pairs with a difference greater than the null hypothesis
* \\(n_{neg}\\), the number of pairs with a difference greater than the null hypothesis
* \\(n_{0}\\), the number of pairs with a difference equal to the null hypothesis
* \\(d_{H0}\\), the difference according to the null hypothesis, usually 0
* \\(x_i\\), is the i-th score from the first variable
* \\(y_i\\), is the i-th score from the second variable
References
----------
Bian, G., McAleer, M., & Wong, W.-K. (2009). A trinomial test for paired data when there are many ties. *SSRN Electronic Journal*. doi:10.2139/ssrn.1410589
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
ct = tab_cross(field1, field2, order1=levels, order2=levels)
k1 = ct.shape[0]
k2 = ct.shape[1]
if levels is not None:
#replace row labels with numeric score
ct = ct.reset_index(drop=True)
ct.columns = [i for i in range(0, k2)]
npos = 0
nneg = 0
n0 = 0
for i in range(0, k1):
for j in range(0, k2):
if ct.index[i] - ct.columns[j] > dmu:
npos = npos + ct.iloc[i, j]
elif ct.index[i] - ct.columns[j] < dmu:
nneg = nneg + ct.iloc[i, j]
else:
n0 = n0 + ct.iloc[i, j]
nc = nneg + npos
nMin = npos
nMax = nneg
n = npos + nneg + n0
nd = abs(npos - nneg)
p0 = n0 / n
ppos = (1 - p0) / 2
pneg = ppos
probs = [ppos, pneg, p0]
ns = [0]*3
sig = 0
for z in range(nd, n+1):
for j in range(0, int((n - z)/2)+1):
ns[0] = j
ns[1] = j + z
ns[2] = n - j - (j + z)
sig = sig + di_mpmf(ns, probs)
if sig < 0.5:
pVal = sig * 2
else:
pVal = 2 * (1 - pVal)
res = pd.DataFrame([[npos, nneg, n0, pVal]])
res.columns = ["n pos", "n neg", "n 0", "p-value"]
return res
Functions
def ts_trinomial_ps(field1, field2, levels=None, dmu=0)-
Trinomial Test (Paired Samples)
A similar test as the sign test, but also includes the pairs that are tied.
Parameters
field1:pandas series- the ordinal or scale scores of the first variable
field2:pandas series- the ordinal or scale scores of the second variable
levels:listordictionary, optional- the categories to use
dmu:float, optional- the hypothesized difference between each pair. Default is 0.
Returns
Dataframe with:
- n pos, the number of pairs with a difference above dmu
- n neg, the number of pairs with a difference below dmu
- n 0, the number of pairs with a difference equal to dmu
- p-value, the p-value (significance)
Notes
The formula used (Bian et al., 2009, pp. 5-6): sig. = 2\times\text{TRI}\left(\left(n_{pos},n_{neg},n_{0}\right),\left(p_{pos},p_{neg},p_{0}\right)\right)
With: n_{pos}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i>d_{H0} \\ 0 & \text{ if } d_i\leq d_{H0} \end{cases} n_{neg}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i\geq d_{H0} \\ 0 & \text{ if } d_i<d_{H0} \end{cases} n_{0}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i = d_{H0} \\ 0 & \text{ if } d_i<d_{H0} \end{cases} d_i = x_i - y_i p_0 = \frac{n_0}{n} p_{pos} = p_{neg} = \frac{1-p_0}{2}
The cumulative mass function of the trinomial distribution is then calculated using: \text{TRI} \left(\left(n_{pos},n_{neg},n_{0}\right), \left(p_{pos},p_{neg},p_{0}\right)\right) = \sum_{i=n_d}^n \sum_{j=0}^{\lfloor \frac{n-i}{2}\rfloor} \text{tri}\left(\left(j, j+i, n-j-\left(j+i\right)\right), \left(p_{pos}, p_{neg}, p_0\right)\right)
The probability mass function of the trinomial distribution is then determined using the multinomial pmf function di_mpmf(). Which for the trinomial distribution becomes: \text{tri} \left(\left(a,b,c\right), \left(p_{pos},p_{neg},p_{0}\right)\right) = \frac{n!}{a!\times b!\times c!}\times p_{pos}^a\times p_{neb}^b\times p_0^c
With: n_d = \left|n_{pos} - n_{neg}\right|
- n, is the number of cases unequal to a difference of 0
- n_{pos}, the number of pairs with a difference greater than the null hypothesis
- n_{neg}, the number of pairs with a difference greater than the null hypothesis
- n_{0}, the number of pairs with a difference equal to the null hypothesis
- d_{H0}, the difference according to the null hypothesis, usually 0
- x_i, is the i-th score from the first variable
- y_i, is the i-th score from the second variable
References
Bian, G., McAleer, M., & Wong, W.-K. (2009). A trinomial test for paired data when there are many ties. SSRN Electronic Journal. doi:10.2139/ssrn.1410589
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_trinomial_ps(field1, field2, levels=None, dmu=0): ''' Trinomial Test (Paired Samples) ------------------------------- A similar test as the sign test, but also includes the pairs that are tied. Parameters ---------- field1 : pandas series the ordinal or scale scores of the first variable field2 : pandas series the ordinal or scale scores of the second variable levels : list or dictionary, optional the categories to use dmu : float, optional the hypothesized difference between each pair. Default is 0. Returns ------- Dataframe with: * *n pos*, the number of pairs with a difference above dmu * *n neg*, the number of pairs with a difference below dmu * *n 0*, the number of pairs with a difference equal to dmu * *p-value*, the p-value (significance) Notes ----- The formula used (Bian et al., 2009, pp. 5-6): $$sig. = 2\\times\\text{TRI}\\left(\\left(n_{pos},n_{neg},n_{0}\\right),\\left(p_{pos},p_{neg},p_{0}\\right)\\right)$$ With: $$n_{pos}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i>d_{H0} \\\\ 0 & \\text{ if } d_i\\leq d_{H0} \\end{cases}$$ $$n_{neg}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i\\geq d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$ $$n_{0}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i = d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$ $$d_i = x_i - y_i$$ $$p_0 = \\frac{n_0}{n}$$ $$p_{pos} = p_{neg} = \\frac{1-p_0}{2}$$ The cumulative mass function of the trinomial distribution is then calculated using: $$\\text{TRI} \\left(\\left(n_{pos},n_{neg},n_{0}\\right), \\left(p_{pos},p_{neg},p_{0}\\right)\\right) = \\sum_{i=n_d}^n \\sum_{j=0}^{\lfloor \\frac{n-i}{2}\\rfloor} \\text{tri}\\left(\\left(j, j+i, n-j-\\left(j+i\\right)\\right), \\left(p_{pos}, p_{neg}, p_0\\right)\\right)$$ The probability mass function of the trinomial distribution is then determined using the multinomial pmf function di_mpmf(). Which for the trinomial distribution becomes: $$\\text{tri} \\left(\\left(a,b,c\\right), \\left(p_{pos},p_{neg},p_{0}\\right)\\right) = \\frac{n!}{a!\\times b!\\times c!}\\times p_{pos}^a\\times p_{neb}^b\\times p_0^c$$ With: $$n_d = \\left|n_{pos} - n_{neg}\\right|$$ * \\(n\\), is the number of cases unequal to a difference of 0 * \\(n_{pos}\\), the number of pairs with a difference greater than the null hypothesis * \\(n_{neg}\\), the number of pairs with a difference greater than the null hypothesis * \\(n_{0}\\), the number of pairs with a difference equal to the null hypothesis * \\(d_{H0}\\), the difference according to the null hypothesis, usually 0 * \\(x_i\\), is the i-th score from the first variable * \\(y_i\\), is the i-th score from the second variable References ---------- Bian, G., McAleer, M., & Wong, W.-K. (2009). A trinomial test for paired data when there are many ties. *SSRN Electronic Journal*. doi:10.2139/ssrn.1410589 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' ct = tab_cross(field1, field2, order1=levels, order2=levels) k1 = ct.shape[0] k2 = ct.shape[1] if levels is not None: #replace row labels with numeric score ct = ct.reset_index(drop=True) ct.columns = [i for i in range(0, k2)] npos = 0 nneg = 0 n0 = 0 for i in range(0, k1): for j in range(0, k2): if ct.index[i] - ct.columns[j] > dmu: npos = npos + ct.iloc[i, j] elif ct.index[i] - ct.columns[j] < dmu: nneg = nneg + ct.iloc[i, j] else: n0 = n0 + ct.iloc[i, j] nc = nneg + npos nMin = npos nMax = nneg n = npos + nneg + n0 nd = abs(npos - nneg) p0 = n0 / n ppos = (1 - p0) / 2 pneg = ppos probs = [ppos, pneg, p0] ns = [0]*3 sig = 0 for z in range(nd, n+1): for j in range(0, int((n - z)/2)+1): ns[0] = j ns[1] = j + z ns[2] = n - j - (j + z) sig = sig + di_mpmf(ns, probs) if sig < 0.5: pVal = sig * 2 else: pVal = 2 * (1 - pVal) res = pd.DataFrame([[npos, nneg, n0, pVal]]) res.columns = ["n pos", "n neg", "n 0", "p-value"] return res