Module stikpetP.tests.test_sign_ps
Expand source code
import pandas as pd
from scipy.stats import binom
from statistics import NormalDist
from ..other.table_cross import tab_cross
def ts_sign_ps(field1, field2, levels=None, dmu=0, method="exact"):
'''
Sign Test (Paired Samples)
--------------------------
This test compares the number of pairs that have a difference above the hypothesized difference, with those below the difference. It can be considered an alternative for the paired samples t-test.
Parameters
----------
field1 : pandas series
the ordinal or scale scores of the first variable
field2 : pandas series
the ordinal or scale scores of the second variable
levels : list or dictionary, optional
the levels from field1 and field2
dmu : float, optional
the hypothesized difference between each pair. Default is 0.
method : {"exact", "appr"}, optional
test to be used. Default is "exact".
Returns
-------
Dataframe with:
* *n pos*, the number of pairs with a difference above dmu
* *n neg*, the number of pairs with a difference below dmu
* *statistic*, the test statistic (only applicable if method="appr")
* *p-value*, the p-value (significance)
Notes
-----
If method="exact" the binomial distribution will be used. The formula used is (Dixon & Mood, 1946):
$$sig. = 2\\times Bin\\left(n, \\min\\left(n_{pos}, n_{neg}\\right), \\frac{1}{2}\\right)$$
When using the approximation, the standard normal distribution is used (SPSS, 2006, p. 483):
$$z = \\frac{\\max\\left(n_{pos}, n_{neg}\\right)-0.5\\times\\left(n_{pos} + n_{neg}\\right)-0.5}{0.5\\times\\sqrt{n_{pos}+n_{neg}}}$$
$$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$
With:
$$n_{pos}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i>d_{H0} \\\\ 0 & \\text{ if } d_i\\leq d_{H0} \\end{cases}$$
$$n_{neg}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i\\geq d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$
$$d_i = x_i - y_i$$
The test was already described by Arbuthnott (1710)
*Symbols used*
* \\(n\\), is the number of cases unequal to a difference of 0
* \\(n_{pos}\\), the number of pairs with a difference greater than the null hypothesis
* \\(n_{neg}\\), the number of pairs with a difference greater than the null hypothesis
* \\(d_{H0}\\), the difference according to the null hypothesis, usually 0
* \\(x_i\\), is the i-th score from the first variable
* \\(y_i\\), is the i-th score from the second variable
* \\(\\text{Bin}\\left(\\dots\\right)\\), is the binomial cumulative distribution function
* \\(\\Phi\\left(\\dots\\right)\\), is the cumulative distribution function of the standard normal distribution
References
----------
Arbuthnott, J. (1710). An argument for divine providence, taken from the constant regularity observed in the births of both sexes. *Philosophical Transactions of the Royal Society of London, 27*(328), 186–190. doi:10.1098/rstl.1710.0011
Dixon, W. J., & Mood, A. M. (1946). The statistical sign test. *Journal of the American Statistical Association, 41*(236), 557–566. doi:10.1080/01621459.1946.10501898
SPSS. (2006). SPSS 15.0 algorithms.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
ct = tab_cross(field1, field2, order1=levels, order2=levels)
k1 = ct.shape[0]
k2 = ct.shape[1]
if levels is not None:
#replace row labels with numeric score
ct = ct.reset_index(drop=True)
ct.columns = [i for i in range(0, k2)]
npos = 0
nneg = 0
for i in range(0, k1):
for j in range(0, k2):
if i != j:
if ct.index[i] - ct.columns[j] > dmu:
npos = npos + ct.iloc[i, j]
elif ct.index[i] - ct.columns[j] < dmu:
nneg = nneg + ct.iloc[i, j]
nc = nneg + npos
nMin = npos
nMax = nneg
if nneg < npos:
nMin = nneg
nMax = npos
if method == "exact":
z = None
p = 2 * binom.cdf(nMin, nc, 0.5)
elif method=="appr":
z = (nMax - 0.5 * nc - 0.5) / (0.5 * (nc)**0.5)
p = 2 * (1 - NormalDist().cdf(abs(z)))
res = pd.DataFrame([[npos, nneg, z, p]])
res.columns = ["n pos", "n neg", "statistic", "p-value"]
return res
Functions
def ts_sign_ps(field1, field2, levels=None, dmu=0, method='exact')-
Sign Test (Paired Samples)
This test compares the number of pairs that have a difference above the hypothesized difference, with those below the difference. It can be considered an alternative for the paired samples t-test.
Parameters
field1:pandas series- the ordinal or scale scores of the first variable
field2:pandas series- the ordinal or scale scores of the second variable
levels:listordictionary, optional- the levels from field1 and field2
dmu:float, optional- the hypothesized difference between each pair. Default is 0.
method:{"exact", "appr"}, optional- test to be used. Default is "exact".
Returns
Dataframe with:
- n pos, the number of pairs with a difference above dmu
- n neg, the number of pairs with a difference below dmu
- statistic, the test statistic (only applicable if method="appr")
- p-value, the p-value (significance)
Notes
If method="exact" the binomial distribution will be used. The formula used is (Dixon & Mood, 1946): sig. = 2\times Bin\left(n, \min\left(n_{pos}, n_{neg}\right), \frac{1}{2}\right)
When using the approximation, the standard normal distribution is used (SPSS, 2006, p. 483): z = \frac{\max\left(n_{pos}, n_{neg}\right)-0.5\times\left(n_{pos} + n_{neg}\right)-0.5}{0.5\times\sqrt{n_{pos}+n_{neg}}} sig. = 2\times\left(1 - \Phi\left(\left|z\right|\right)\right)
With: n_{pos}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i>d_{H0} \\ 0 & \text{ if } d_i\leq d_{H0} \end{cases} n_{neg}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i\geq d_{H0} \\ 0 & \text{ if } d_i<d_{H0} \end{cases} d_i = x_i - y_i
The test was already described by Arbuthnott (1710)
Symbols used
- n, is the number of cases unequal to a difference of 0
- n_{pos}, the number of pairs with a difference greater than the null hypothesis
- n_{neg}, the number of pairs with a difference greater than the null hypothesis
- d_{H0}, the difference according to the null hypothesis, usually 0
- x_i, is the i-th score from the first variable
- y_i, is the i-th score from the second variable
- \text{Bin}\left(\dots\right), is the binomial cumulative distribution function
- \Phi\left(\dots\right), is the cumulative distribution function of the standard normal distribution
References
Arbuthnott, J. (1710). An argument for divine providence, taken from the constant regularity observed in the births of both sexes. Philosophical Transactions of the Royal Society of London, 27(328), 186–190. doi:10.1098/rstl.1710.0011
Dixon, W. J., & Mood, A. M. (1946). The statistical sign test. Journal of the American Statistical Association, 41(236), 557–566. doi:10.1080/01621459.1946.10501898
SPSS. (2006). SPSS 15.0 algorithms.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_sign_ps(field1, field2, levels=None, dmu=0, method="exact"): ''' Sign Test (Paired Samples) -------------------------- This test compares the number of pairs that have a difference above the hypothesized difference, with those below the difference. It can be considered an alternative for the paired samples t-test. Parameters ---------- field1 : pandas series the ordinal or scale scores of the first variable field2 : pandas series the ordinal or scale scores of the second variable levels : list or dictionary, optional the levels from field1 and field2 dmu : float, optional the hypothesized difference between each pair. Default is 0. method : {"exact", "appr"}, optional test to be used. Default is "exact". Returns ------- Dataframe with: * *n pos*, the number of pairs with a difference above dmu * *n neg*, the number of pairs with a difference below dmu * *statistic*, the test statistic (only applicable if method="appr") * *p-value*, the p-value (significance) Notes ----- If method="exact" the binomial distribution will be used. The formula used is (Dixon & Mood, 1946): $$sig. = 2\\times Bin\\left(n, \\min\\left(n_{pos}, n_{neg}\\right), \\frac{1}{2}\\right)$$ When using the approximation, the standard normal distribution is used (SPSS, 2006, p. 483): $$z = \\frac{\\max\\left(n_{pos}, n_{neg}\\right)-0.5\\times\\left(n_{pos} + n_{neg}\\right)-0.5}{0.5\\times\\sqrt{n_{pos}+n_{neg}}}$$ $$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$ With: $$n_{pos}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i>d_{H0} \\\\ 0 & \\text{ if } d_i\\leq d_{H0} \\end{cases}$$ $$n_{neg}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i\\geq d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$ $$d_i = x_i - y_i$$ The test was already described by Arbuthnott (1710) *Symbols used* * \\(n\\), is the number of cases unequal to a difference of 0 * \\(n_{pos}\\), the number of pairs with a difference greater than the null hypothesis * \\(n_{neg}\\), the number of pairs with a difference greater than the null hypothesis * \\(d_{H0}\\), the difference according to the null hypothesis, usually 0 * \\(x_i\\), is the i-th score from the first variable * \\(y_i\\), is the i-th score from the second variable * \\(\\text{Bin}\\left(\\dots\\right)\\), is the binomial cumulative distribution function * \\(\\Phi\\left(\\dots\\right)\\), is the cumulative distribution function of the standard normal distribution References ---------- Arbuthnott, J. (1710). An argument for divine providence, taken from the constant regularity observed in the births of both sexes. *Philosophical Transactions of the Royal Society of London, 27*(328), 186–190. doi:10.1098/rstl.1710.0011 Dixon, W. J., & Mood, A. M. (1946). The statistical sign test. *Journal of the American Statistical Association, 41*(236), 557–566. doi:10.1080/01621459.1946.10501898 SPSS. (2006). SPSS 15.0 algorithms. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' ct = tab_cross(field1, field2, order1=levels, order2=levels) k1 = ct.shape[0] k2 = ct.shape[1] if levels is not None: #replace row labels with numeric score ct = ct.reset_index(drop=True) ct.columns = [i for i in range(0, k2)] npos = 0 nneg = 0 for i in range(0, k1): for j in range(0, k2): if i != j: if ct.index[i] - ct.columns[j] > dmu: npos = npos + ct.iloc[i, j] elif ct.index[i] - ct.columns[j] < dmu: nneg = nneg + ct.iloc[i, j] nc = nneg + npos nMin = npos nMax = nneg if nneg < npos: nMin = nneg nMax = npos if method == "exact": z = None p = 2 * binom.cdf(nMin, nc, 0.5) elif method=="appr": z = (nMax - 0.5 * nc - 0.5) / (0.5 * (nc)**0.5) p = 2 * (1 - NormalDist().cdf(abs(z))) res = pd.DataFrame([[npos, nneg, z, p]]) res.columns = ["n pos", "n neg", "statistic", "p-value"] return res