Module `stikpetP.tests.test_sign_ps`

Expand source code

import pandas as pd
from scipy.stats import binom
from statistics import NormalDist
from ..other.table_cross import tab_cross

def ts_sign_ps(field1, field2, levels=None, dmu=0, method="exact"):
    '''
    Sign Test (Paired Samples)
    --------------------------
    This test compares the number of pairs that have a difference above the hypothesized difference, with those below the difference. It can be considered an alternative for the paired samples t-test.
    
    Parameters
    ----------
    field1 : pandas series
        the ordinal or scale scores of the first variable
    field2 : pandas series
        the ordinal or scale scores of the second variable
    levels : list or dictionary, optional
        the levels from field1 and field2
    dmu : float, optional
        the hypothesized difference between each pair. Default is 0.
    method : {"exact", "appr"}, optional
        test to be used. Default is "exact".
        
    Returns
    -------
    Dataframe with:
    
    * *n pos*, the number of pairs with a difference above dmu
    * *n neg*, the number of pairs with a difference below dmu
    * *statistic*, the test statistic (only applicable if method="appr")
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    If method="exact" the binomial distribution will be used. The formula used is (Dixon & Mood, 1946):
    $$sig. = 2\\times Bin\\left(n, \\min\\left(n_{pos}, n_{neg}\\right), \\frac{1}{2}\\right)$$
    
    When using the approximation, the standard normal distribution is used (SPSS, 2006, p. 483):
    $$z = \\frac{\\max\\left(n_{pos}, n_{neg}\\right)-0.5\\times\\left(n_{pos} + n_{neg}\\right)-0.5}{0.5\\times\\sqrt{n_{pos}+n_{neg}}}$$
    $$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$
    
    With:
    $$n_{pos}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i>d_{H0} \\\\ 0 & \\text{ if } d_i\\leq d_{H0} \\end{cases}$$
    $$n_{neg}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i\\geq d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$
    $$d_i = x_i - y_i$$
    
    The test was already described by Arbuthnott (1710)
    
    *Symbols used*
    
    * \\(n\\), is the number of cases unequal to a difference of 0
    * \\(n_{pos}\\), the number of pairs with a difference greater than the null hypothesis
    * \\(n_{neg}\\), the number of pairs with a difference greater than the null hypothesis
    * \\(d_{H0}\\), the difference according to the null hypothesis, usually 0
    * \\(x_i\\), is the i-th score from the first variable
    * \\(y_i\\), is the i-th score from the second variable
    * \\(\\text{Bin}\\left(\\dots\\right)\\), is the binomial cumulative distribution function
    * \\(\\Phi\\left(\\dots\\right)\\), is the cumulative distribution function of the standard normal distribution

    References
    ----------
    Arbuthnott, J. (1710). An argument for divine providence, taken from the constant regularity observed in the births of both sexes. *Philosophical Transactions of the Royal Society of London, 27*(328), 186–190. doi:10.1098/rstl.1710.0011
    
    Dixon, W. J., & Mood, A. M. (1946). The statistical sign test. *Journal of the American Statistical Association, 41*(236), 557–566. doi:10.1080/01621459.1946.10501898
    
    SPSS. (2006). SPSS 15.0 algorithms.

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076   
    
    
    '''
    
    ct = tab_cross(field1, field2, order1=levels, order2=levels)
    k1 = ct.shape[0]
    k2 = ct.shape[1]
    
    if levels is not None:
        #replace row labels with numeric score
        ct = ct.reset_index(drop=True) 
        ct.columns = [i for i in range(0, k2)]
    
    npos = 0
    nneg = 0
    for i in range(0, k1):
        for j in range(0, k2):
            if i != j:
                if ct.index[i] - ct.columns[j] > dmu:
                    npos = npos + ct.iloc[i, j]
                elif ct.index[i] - ct.columns[j] < dmu:
                    nneg = nneg + ct.iloc[i, j]
                
    nc = nneg + npos
    nMin = npos
    nMax = nneg
    
    if nneg < npos:
        nMin = nneg
        nMax = npos
    
    if method == "exact":
        z = None
        p = 2 * binom.cdf(nMin, nc, 0.5)
    elif method=="appr":
        z = (nMax - 0.5 * nc - 0.5) / (0.5 * (nc)**0.5)
        p = 2 * (1 - NormalDist().cdf(abs(z)))
    
    res = pd.DataFrame([[npos, nneg, z, p]])
    res.columns = ["n pos", "n neg", "statistic", "p-value"]
    
    return res

Functions

def ts_sign_ps(field1, field2, levels=None, dmu=0, method='exact')

Sign Test (Paired Samples)

This test compares the number of pairs that have a difference above the hypothesized difference, with those below the difference. It can be considered an alternative for the paired samples t-test.

Parameters

field1 : pandas series: the ordinal or scale scores of the first variable
field2 : pandas series: the ordinal or scale scores of the second variable
levels : list or dictionary, optional: the levels from field1 and field2
dmu : float, optional: the hypothesized difference between each pair. Default is 0.
method : {"exact", "appr"}, optional: test to be used. Default is "exact".

Returns

Dataframe with:

n pos, the number of pairs with a difference above dmu
n neg, the number of pairs with a difference below dmu
statistic, the test statistic (only applicable if method="appr")
p-value, the p-value (significance)

Notes

If method="exact" the binomial distribution will be used. The formula used is (Dixon & Mood, 1946): $sig. = 2\times Bin\left(n, \min\left(n_{pos}, n_{neg}\right), \frac{1}{2}\right)$

When using the approximation, the standard normal distribution is used (SPSS, 2006, p. 483): $z = \frac{\max\left(n_{pos}, n_{neg}\right)-0.5\times\left(n_{pos} + n_{neg}\right)-0.5}{0.5\times\sqrt{n_{pos}+n_{neg}}}$ $sig. = 2\times\left(1 - \Phi\left(\left|z\right|\right)\right)$

With: $n_{pos}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i>d_{H0} \\ 0 & \text{ if } d_i\leq d_{H0} \end{cases}$ $n_{neg}=\sum_{i=1}^n \begin{cases} 1 & \text{ if } d_i\geq d_{H0} \\ 0 & \text{ if } d_i<d_{H0} \end{cases}$ $d_i = x_i - y_i$

The test was already described by Arbuthnott (1710)

Symbols used

$n$ , is the number of cases unequal to a difference of 0
$n_{pos}$ , the number of pairs with a difference greater than the null hypothesis
$n_{neg}$ , the number of pairs with a difference greater than the null hypothesis
$d_{H0}$ , the difference according to the null hypothesis, usually 0
$x_i$ , is the i-th score from the first variable
$y_i$ , is the i-th score from the second variable
$\text{Bin}\left(\dots\right)$ , is the binomial cumulative distribution function
$\Phi\left(\dots\right)$ , is the cumulative distribution function of the standard normal distribution

References

Arbuthnott, J. (1710). An argument for divine providence, taken from the constant regularity observed in the births of both sexes. Philosophical Transactions of the Royal Society of London, 27(328), 186–190. doi:10.1098/rstl.1710.0011

Dixon, W. J., & Mood, A. M. (1946). The statistical sign test. Journal of the American Statistical Association, 41(236), 557–566. doi:10.1080/01621459.1946.10501898

SPSS. (2006). SPSS 15.0 algorithms.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code

def ts_sign_ps(field1, field2, levels=None, dmu=0, method="exact"):
    '''
    Sign Test (Paired Samples)
    --------------------------
    This test compares the number of pairs that have a difference above the hypothesized difference, with those below the difference. It can be considered an alternative for the paired samples t-test.
    
    Parameters
    ----------
    field1 : pandas series
        the ordinal or scale scores of the first variable
    field2 : pandas series
        the ordinal or scale scores of the second variable
    levels : list or dictionary, optional
        the levels from field1 and field2
    dmu : float, optional
        the hypothesized difference between each pair. Default is 0.
    method : {"exact", "appr"}, optional
        test to be used. Default is "exact".
        
    Returns
    -------
    Dataframe with:
    
    * *n pos*, the number of pairs with a difference above dmu
    * *n neg*, the number of pairs with a difference below dmu
    * *statistic*, the test statistic (only applicable if method="appr")
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    If method="exact" the binomial distribution will be used. The formula used is (Dixon & Mood, 1946):
    $$sig. = 2\\times Bin\\left(n, \\min\\left(n_{pos}, n_{neg}\\right), \\frac{1}{2}\\right)$$
    
    When using the approximation, the standard normal distribution is used (SPSS, 2006, p. 483):
    $$z = \\frac{\\max\\left(n_{pos}, n_{neg}\\right)-0.5\\times\\left(n_{pos} + n_{neg}\\right)-0.5}{0.5\\times\\sqrt{n_{pos}+n_{neg}}}$$
    $$sig. = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$
    
    With:
    $$n_{pos}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i>d_{H0} \\\\ 0 & \\text{ if } d_i\\leq d_{H0} \\end{cases}$$
    $$n_{neg}=\\sum_{i=1}^n \\begin{cases} 1 & \\text{ if } d_i\\geq d_{H0} \\\\ 0 & \\text{ if } d_i<d_{H0} \\end{cases}$$
    $$d_i = x_i - y_i$$
    
    The test was already described by Arbuthnott (1710)
    
    *Symbols used*
    
    * \\(n\\), is the number of cases unequal to a difference of 0
    * \\(n_{pos}\\), the number of pairs with a difference greater than the null hypothesis
    * \\(n_{neg}\\), the number of pairs with a difference greater than the null hypothesis
    * \\(d_{H0}\\), the difference according to the null hypothesis, usually 0
    * \\(x_i\\), is the i-th score from the first variable
    * \\(y_i\\), is the i-th score from the second variable
    * \\(\\text{Bin}\\left(\\dots\\right)\\), is the binomial cumulative distribution function
    * \\(\\Phi\\left(\\dots\\right)\\), is the cumulative distribution function of the standard normal distribution

    References
    ----------
    Arbuthnott, J. (1710). An argument for divine providence, taken from the constant regularity observed in the births of both sexes. *Philosophical Transactions of the Royal Society of London, 27*(328), 186–190. doi:10.1098/rstl.1710.0011
    
    Dixon, W. J., & Mood, A. M. (1946). The statistical sign test. *Journal of the American Statistical Association, 41*(236), 557–566. doi:10.1080/01621459.1946.10501898
    
    SPSS. (2006). SPSS 15.0 algorithms.

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076   
    
    
    '''
    
    ct = tab_cross(field1, field2, order1=levels, order2=levels)
    k1 = ct.shape[0]
    k2 = ct.shape[1]
    
    if levels is not None:
        #replace row labels with numeric score
        ct = ct.reset_index(drop=True) 
        ct.columns = [i for i in range(0, k2)]
    
    npos = 0
    nneg = 0
    for i in range(0, k1):
        for j in range(0, k2):
            if i != j:
                if ct.index[i] - ct.columns[j] > dmu:
                    npos = npos + ct.iloc[i, j]
                elif ct.index[i] - ct.columns[j] < dmu:
                    nneg = nneg + ct.iloc[i, j]
                
    nc = nneg + npos
    nMin = npos
    nMax = nneg
    
    if nneg < npos:
        nMin = nneg
        nMax = npos
    
    if method == "exact":
        z = None
        p = 2 * binom.cdf(nMin, nc, 0.5)
    elif method=="appr":
        z = (nMax - 0.5 * nc - 0.5) / (0.5 * (nc)**0.5)
        p = 2 * (1 - NormalDist().cdf(abs(z)))
    
    res = pd.DataFrame([[npos, nneg, z, p]])
    res.columns = ["n pos", "n neg", "statistic", "p-value"]
    
    return res