Module `stikpetP.other.poho_mcnemar_co`

Expand source code

import pandas as pd
from scipy.stats import chi2
from scipy.stats import binom
from ..other.table_cross import tab_cross

def ph_mcnemar_co(field1, field2, categories=None, exact=False, cc=False):
    '''
    Post-Hoc McNemar Test - Collapsed
    ---------------------------------
    After a (McNemar-)Bowker test a post-hoc test can potentially locate where the changes occured. This can be done use a McNemar test, which is the Bowker test but for 2x2 tables.
    
    There are two variations, one is to simply compare each possible pair of categories (pairwise comparison), or compare each category with all other categories (collapsed comparison). This function is for the collapsed version, see **ph_mcnemar_pw()** for the pairwise version.
    
    Instead of using the McNemar test it is also possible to use the binomial test, which will be used if exact is set to True.
    
    Parameters
    ----------
    field1 : list or pandas series
        the first categorical field
    field2 : list or pandas series
        the first categorical field
    categories : list or dictionary, optional
        order and/or selection for categories of field1 and field2
    exact : boolean, optional
        use of exact binomial distribution (default is False)
    cc : boolean, optional
        use of a continuity correction (default is False)
    
    Returns
    -------
    A dataframe with:
    * *category*, the specific category compared to all other categories
    * *n*, the sample size
    * *statistic*, the chi-squared value (if applicable)
    * *df*, the degrees of freedom used in the test (if applicable)
    * *p-value*, the significance (p-value)
    * *adj. p-value*, the Bonferroni adjusted p-value
    
    Notes
    -----
    The formula used is (McNemar, 1947, p. 156):
    $$\\chi_{M}^2 = \\frac{\\left(F_{1,2} - F_{2,1}\\right)^2}{F_{1,2} + F_{2,1}}$$
    $$df = 1$$
    $$sig. = 1 - \\chi^2\\left(\\chi_M^2, df\\right)$$
    
    If a continuity correction is applied the formula changes to:
    $$\\chi_{M*}^2 = \\frac{\\left(\\left|F_{1,2} - F_{2,1}\\right| - 1\\right)^2}{F_{1,2} + F_{2,1}}$$
    
    The formula used for the binomial test is:
    $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$
    
    The formula used for the binomial test with a mid-p correction:
    $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right) -\\text{bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$
    
    The number of tests is the number of categories ( \\(k\\) ). The adjusted p-value is then determined using a Bonferroni correction:
    $$sig._{adj} = \\begin{cases} sig. \\times k & \\text{ if } sig. \\times k \\leq = 1 \\\\ 1 & \\text{ if } sig. \\times k > 1 \\end{cases}$$
    
    *Symbols used*
    
    * \\(F_{1,2}\\), the observed count of cases that scored category 1 on the first variable, and another category on the second.
    * \\(F_{2,1}\\), the observed count of cases that scored another category on the first variable, and category 1 on the second.
    * \\(k\\), the number of categories
    * \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution.
    * \\(\\text{Bin}\\left(\\dots\\right)\\), the cumulative distribution function for the binomial distribution.
    * \\(\\text{bin}\\left(\\dots\\right)\\), the probability mass function for the binomial distribution.
    
    References
    ----------
    McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
        
    '''
    #create the cross table
    ct = tab_cross(field1, field2, categories, categories, totals="include")    
    
    #basic counts
    k = ct.shape[0]-1
    n = ct.iloc[k, k]
    
    res = pd.DataFrame()
    for i in range(0, k):
        a = ct.iloc[i,i]
        b = ct.iloc[i,k] - a
        c = ct.iloc[k, i] - a
        d = n - a - b - c
                
        res.at[i, 0] = ct.index[i]
        res.at[i,1] = n
        
        if exact:
            minCount = min(b,c)
            pVal = 2*binom.cdf(minCount, n, 0.5)
            if cc:
                pVal = pVal - binom.pmf(minCount, n,0.5)
            
            stat = None
            df = None
            
        else:            
            if cc:
                stat = (abs(b - c)-1)**2 / (b+c)
            else:
                stat = (b - c)**2 / (b+c)
            df = 1
            pVal = chi2.sf(stat, df)
            
        res.at[i,2] = stat
        res.at[i,3] = df
        res.at[i,4] = pVal
        res.at[i,5] = res.loc[i,4] * k
        if res.loc[i,5] > 1:
            res.loc[i,5] = 1
            
    res.columns = ["category", "n", "statistic", "df", "p-value", "adj. p-value"]

    return res

Functions

def ph_mcnemar_co(field1, field2, categories=None, exact=False, cc=False)

Post-Hoc McNemar Test - Collapsed

After a (McNemar-)Bowker test a post-hoc test can potentially locate where the changes occured. This can be done use a McNemar test, which is the Bowker test but for 2x2 tables.

There are two variations, one is to simply compare each possible pair of categories (pairwise comparison), or compare each category with all other categories (collapsed comparison). This function is for the collapsed version, see ph_mcnemar_pw() for the pairwise version.

Instead of using the McNemar test it is also possible to use the binomial test, which will be used if exact is set to True.

Parameters

field1 : list or pandas series: the first categorical field
field2 : list or pandas series: the first categorical field
categories : list or dictionary, optional: order and/or selection for categories of field1 and field2
exact : boolean, optional: use of exact binomial distribution (default is False)
cc : boolean, optional: use of a continuity correction (default is False)

Returns

A dataframe with:

category, the specific category compared to all other categories
n, the sample size
statistic, the chi-squared value (if applicable)
df, the degrees of freedom used in the test (if applicable)
p-value, the significance (p-value)
adj. p-value, the Bonferroni adjusted p-value

Notes

The formula used is (McNemar, 1947, p. 156): $\chi_{M}^2 = \frac{\left(F_{1,2} - F_{2,1}\right)^2}{F_{1,2} + F_{2,1}}$ $df = 1$ $sig. = 1 - \chi^2\left(\chi_M^2, df\right)$

If a continuity correction is applied the formula changes to: $\chi_{M*}^2 = \frac{\left(\left|F_{1,2} - F_{2,1}\right| - 1\right)^2}{F_{1,2} + F_{2,1}}$

The formula used for the binomial test is: $sig. = 2\times\text{Bin}\left(F_{1,2} + F_{2,1}, \min\left(F_{1,2}, F_{2,1}\right), 0.5\right)$

The formula used for the binomial test with a mid-p correction: $sig. = 2\times\text{Bin}\left(F_{1,2} + F_{2,1}, \min\left(F_{1,2}, F_{2,1}\right), 0.5\right) -\text{bin}\left(F_{1,2} + F_{2,1}, \min\left(F_{1,2}, F_{2,1}\right), 0.5\right)$

The number of tests is the number of categories ( $k$ ). The adjusted p-value is then determined using a Bonferroni correction: $sig._{adj} = \begin{cases} sig. \times k & \text{ if } sig. \times k \leq = 1 \\ 1 & \text{ if } sig. \times k > 1 \end{cases}$

Symbols used

$F_{1,2}$ , the observed count of cases that scored category 1 on the first variable, and another category on the second.
$F_{2,1}$ , the observed count of cases that scored another category on the first variable, and category 1 on the second.
$k$ , the number of categories
$\chi^2\left(\dots\right)$ , the cumulative distribution function for the chi-square distribution.
$\text{Bin}\left(\dots\right)$ , the cumulative distribution function for the binomial distribution.
$\text{bin}\left(\dots\right)$ , the probability mass function for the binomial distribution.

References

McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. doi:10.1007/BF02295996

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code

def ph_mcnemar_co(field1, field2, categories=None, exact=False, cc=False):
    '''
    Post-Hoc McNemar Test - Collapsed
    ---------------------------------
    After a (McNemar-)Bowker test a post-hoc test can potentially locate where the changes occured. This can be done use a McNemar test, which is the Bowker test but for 2x2 tables.
    
    There are two variations, one is to simply compare each possible pair of categories (pairwise comparison), or compare each category with all other categories (collapsed comparison). This function is for the collapsed version, see **ph_mcnemar_pw()** for the pairwise version.
    
    Instead of using the McNemar test it is also possible to use the binomial test, which will be used if exact is set to True.
    
    Parameters
    ----------
    field1 : list or pandas series
        the first categorical field
    field2 : list or pandas series
        the first categorical field
    categories : list or dictionary, optional
        order and/or selection for categories of field1 and field2
    exact : boolean, optional
        use of exact binomial distribution (default is False)
    cc : boolean, optional
        use of a continuity correction (default is False)
    
    Returns
    -------
    A dataframe with:
    * *category*, the specific category compared to all other categories
    * *n*, the sample size
    * *statistic*, the chi-squared value (if applicable)
    * *df*, the degrees of freedom used in the test (if applicable)
    * *p-value*, the significance (p-value)
    * *adj. p-value*, the Bonferroni adjusted p-value
    
    Notes
    -----
    The formula used is (McNemar, 1947, p. 156):
    $$\\chi_{M}^2 = \\frac{\\left(F_{1,2} - F_{2,1}\\right)^2}{F_{1,2} + F_{2,1}}$$
    $$df = 1$$
    $$sig. = 1 - \\chi^2\\left(\\chi_M^2, df\\right)$$
    
    If a continuity correction is applied the formula changes to:
    $$\\chi_{M*}^2 = \\frac{\\left(\\left|F_{1,2} - F_{2,1}\\right| - 1\\right)^2}{F_{1,2} + F_{2,1}}$$
    
    The formula used for the binomial test is:
    $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$
    
    The formula used for the binomial test with a mid-p correction:
    $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right) -\\text{bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$
    
    The number of tests is the number of categories ( \\(k\\) ). The adjusted p-value is then determined using a Bonferroni correction:
    $$sig._{adj} = \\begin{cases} sig. \\times k & \\text{ if } sig. \\times k \\leq = 1 \\\\ 1 & \\text{ if } sig. \\times k > 1 \\end{cases}$$
    
    *Symbols used*
    
    * \\(F_{1,2}\\), the observed count of cases that scored category 1 on the first variable, and another category on the second.
    * \\(F_{2,1}\\), the observed count of cases that scored another category on the first variable, and category 1 on the second.
    * \\(k\\), the number of categories
    * \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution.
    * \\(\\text{Bin}\\left(\\dots\\right)\\), the cumulative distribution function for the binomial distribution.
    * \\(\\text{bin}\\left(\\dots\\right)\\), the probability mass function for the binomial distribution.
    
    References
    ----------
    McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
        
    '''
    #create the cross table
    ct = tab_cross(field1, field2, categories, categories, totals="include")    
    
    #basic counts
    k = ct.shape[0]-1
    n = ct.iloc[k, k]
    
    res = pd.DataFrame()
    for i in range(0, k):
        a = ct.iloc[i,i]
        b = ct.iloc[i,k] - a
        c = ct.iloc[k, i] - a
        d = n - a - b - c
                
        res.at[i, 0] = ct.index[i]
        res.at[i,1] = n
        
        if exact:
            minCount = min(b,c)
            pVal = 2*binom.cdf(minCount, n, 0.5)
            if cc:
                pVal = pVal - binom.pmf(minCount, n,0.5)
            
            stat = None
            df = None
            
        else:            
            if cc:
                stat = (abs(b - c)-1)**2 / (b+c)
            else:
                stat = (b - c)**2 / (b+c)
            df = 1
            pVal = chi2.sf(stat, df)
            
        res.at[i,2] = stat
        res.at[i,3] = df
        res.at[i,4] = pVal
        res.at[i,5] = res.loc[i,4] * k
        if res.loc[i,5] > 1:
            res.loc[i,5] = 1
            
    res.columns = ["category", "n", "statistic", "df", "p-value", "adj. p-value"]

    return res