Module stikpetP.other.poho_mcnemar_co
Expand source code
import pandas as pd
from scipy.stats import chi2
from scipy.stats import binom
from ..other.table_cross import tab_cross
def ph_mcnemar_co(field1, field2, categories=None, exact=False, cc=False):
    '''
    Post-Hoc McNemar Test - Collapsed
    ---------------------------------
    After a (McNemar-)Bowker test a post-hoc test can potentially locate where the changes occured. This can be done use a McNemar test, which is the Bowker test but for 2x2 tables.
    
    There are two variations, one is to simply compare each possible pair of categories (pairwise comparison), or compare each category with all other categories (collapsed comparison). This function is for the collapsed version, see **ph_mcnemar_pw()** for the pairwise version.
    
    Instead of using the McNemar test it is also possible to use the binomial test, which will be used if exact is set to True.
    
    Parameters
    ----------
    field1 : list or pandas series
        the first categorical field
    field2 : list or pandas series
        the first categorical field
    categories : list or dictionary, optional
        order and/or selection for categories of field1 and field2
    exact : boolean, optional
        use of exact binomial distribution (default is False)
    cc : boolean, optional
        use of a continuity correction (default is False)
    
    Returns
    -------
    A dataframe with:
    * *category*, the specific category compared to all other categories
    * *n*, the sample size
    * *statistic*, the chi-squared value (if applicable)
    * *df*, the degrees of freedom used in the test (if applicable)
    * *p-value*, the significance (p-value)
    * *adj. p-value*, the Bonferroni adjusted p-value
    
    Notes
    -----
    The formula used is (McNemar, 1947, p. 156):
    $$\\chi_{M}^2 = \\frac{\\left(F_{1,2} - F_{2,1}\\right)^2}{F_{1,2} + F_{2,1}}$$
    $$df = 1$$
    $$sig. = 1 - \\chi^2\\left(\\chi_M^2, df\\right)$$
    
    If a continuity correction is applied the formula changes to:
    $$\\chi_{M*}^2 = \\frac{\\left(\\left|F_{1,2} - F_{2,1}\\right| - 1\\right)^2}{F_{1,2} + F_{2,1}}$$
    
    The formula used for the binomial test is:
    $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$
    
    The formula used for the binomial test with a mid-p correction:
    $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right) -\\text{bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$
    
    The number of tests is the number of categories ( \\(k\\) ). The adjusted p-value is then determined using a Bonferroni correction:
    $$sig._{adj} = \\begin{cases} sig. \\times k & \\text{ if } sig. \\times k \\leq = 1 \\\\ 1 & \\text{ if } sig. \\times k > 1 \\end{cases}$$
    
    *Symbols used*
    
    * \\(F_{1,2}\\), the observed count of cases that scored category 1 on the first variable, and another category on the second.
    * \\(F_{2,1}\\), the observed count of cases that scored another category on the first variable, and category 1 on the second.
    * \\(k\\), the number of categories
    * \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution.
    * \\(\\text{Bin}\\left(\\dots\\right)\\), the cumulative distribution function for the binomial distribution.
    * \\(\\text{bin}\\left(\\dots\\right)\\), the probability mass function for the binomial distribution.
    
    References
    ----------
    McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
        
    '''
    #create the cross table
    ct = tab_cross(field1, field2, categories, categories, totals="include")    
    
    #basic counts
    k = ct.shape[0]-1
    n = ct.iloc[k, k]
    
    res = pd.DataFrame()
    for i in range(0, k):
        a = ct.iloc[i,i]
        b = ct.iloc[i,k] - a
        c = ct.iloc[k, i] - a
        d = n - a - b - c
                
        res.at[i, 0] = ct.index[i]
        res.at[i,1] = n
        
        if exact:
            minCount = min(b,c)
            pVal = 2*binom.cdf(minCount, n, 0.5)
            if cc:
                pVal = pVal - binom.pmf(minCount, n,0.5)
            
            stat = None
            df = None
            
        else:            
            if cc:
                stat = (abs(b - c)-1)**2 / (b+c)
            else:
                stat = (b - c)**2 / (b+c)
            df = 1
            pVal = chi2.sf(stat, df)
            
        res.at[i,2] = stat
        res.at[i,3] = df
        res.at[i,4] = pVal
        res.at[i,5] = res.loc[i,4] * k
        if res.loc[i,5] > 1:
            res.loc[i,5] = 1
            
    res.columns = ["category", "n", "statistic", "df", "p-value", "adj. p-value"]
    return resFunctions
- def ph_mcnemar_co(field1, field2, categories=None, exact=False, cc=False)
- 
Post-Hoc McNemar Test - CollapsedAfter a (McNemar-)Bowker test a post-hoc test can potentially locate where the changes occured. This can be done use a McNemar test, which is the Bowker test but for 2x2 tables. There are two variations, one is to simply compare each possible pair of categories (pairwise comparison), or compare each category with all other categories (collapsed comparison). This function is for the collapsed version, see ph_mcnemar_pw() for the pairwise version. Instead of using the McNemar test it is also possible to use the binomial test, which will be used if exact is set to True. Parameters- field1:- listor- pandas series
- the first categorical field
- field2:- listor- pandas series
- the first categorical field
- categories:- listor- dictionary, optional
- order and/or selection for categories of field1 and field2
- exact:- boolean, optional
- use of exact binomial distribution (default is False)
- cc:- boolean, optional
- use of a continuity correction (default is False)
 Returns- A dataframe with:
 - category, the specific category compared to all other categories
- n, the sample size
- statistic, the chi-squared value (if applicable)
- df, the degrees of freedom used in the test (if applicable)
- p-value, the significance (p-value)
- adj. p-value, the Bonferroni adjusted p-value
 NotesThe formula used is (McNemar, 1947, p. 156): \chi_{M}^2 = \frac{\left(F_{1,2} - F_{2,1}\right)^2}{F_{1,2} + F_{2,1}} df = 1 sig. = 1 - \chi^2\left(\chi_M^2, df\right) If a continuity correction is applied the formula changes to: \chi_{M*}^2 = \frac{\left(\left|F_{1,2} - F_{2,1}\right| - 1\right)^2}{F_{1,2} + F_{2,1}} The formula used for the binomial test is: sig. = 2\times\text{Bin}\left(F_{1,2} + F_{2,1}, \min\left(F_{1,2}, F_{2,1}\right), 0.5\right) The formula used for the binomial test with a mid-p correction: sig. = 2\times\text{Bin}\left(F_{1,2} + F_{2,1}, \min\left(F_{1,2}, F_{2,1}\right), 0.5\right) -\text{bin}\left(F_{1,2} + F_{2,1}, \min\left(F_{1,2}, F_{2,1}\right), 0.5\right) The number of tests is the number of categories ( k ). The adjusted p-value is then determined using a Bonferroni correction: sig._{adj} = \begin{cases} sig. \times k & \text{ if } sig. \times k \leq = 1 \\ 1 & \text{ if } sig. \times k > 1 \end{cases} Symbols used - F_{1,2}, the observed count of cases that scored category 1 on the first variable, and another category on the second.
- F_{2,1}, the observed count of cases that scored another category on the first variable, and category 1 on the second.
- k, the number of categories
- \chi^2\left(\dots\right), the cumulative distribution function for the chi-square distribution.
- \text{Bin}\left(\dots\right), the cumulative distribution function for the binomial distribution.
- \text{bin}\left(\dots\right), the probability mass function for the binomial distribution.
 ReferencesMcNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. doi:10.1007/BF02295996 AuthorMade by P. Stikker Companion website: https://PeterStatistics.com 
 YouTube channel: https://www.youtube.com/stikpet
 Donations: https://www.patreon.com/bePatron?u=19398076Expand source codedef ph_mcnemar_co(field1, field2, categories=None, exact=False, cc=False): ''' Post-Hoc McNemar Test - Collapsed --------------------------------- After a (McNemar-)Bowker test a post-hoc test can potentially locate where the changes occured. This can be done use a McNemar test, which is the Bowker test but for 2x2 tables. There are two variations, one is to simply compare each possible pair of categories (pairwise comparison), or compare each category with all other categories (collapsed comparison). This function is for the collapsed version, see **ph_mcnemar_pw()** for the pairwise version. Instead of using the McNemar test it is also possible to use the binomial test, which will be used if exact is set to True. Parameters ---------- field1 : list or pandas series the first categorical field field2 : list or pandas series the first categorical field categories : list or dictionary, optional order and/or selection for categories of field1 and field2 exact : boolean, optional use of exact binomial distribution (default is False) cc : boolean, optional use of a continuity correction (default is False) Returns ------- A dataframe with: * *category*, the specific category compared to all other categories * *n*, the sample size * *statistic*, the chi-squared value (if applicable) * *df*, the degrees of freedom used in the test (if applicable) * *p-value*, the significance (p-value) * *adj. p-value*, the Bonferroni adjusted p-value Notes ----- The formula used is (McNemar, 1947, p. 156): $$\\chi_{M}^2 = \\frac{\\left(F_{1,2} - F_{2,1}\\right)^2}{F_{1,2} + F_{2,1}}$$ $$df = 1$$ $$sig. = 1 - \\chi^2\\left(\\chi_M^2, df\\right)$$ If a continuity correction is applied the formula changes to: $$\\chi_{M*}^2 = \\frac{\\left(\\left|F_{1,2} - F_{2,1}\\right| - 1\\right)^2}{F_{1,2} + F_{2,1}}$$ The formula used for the binomial test is: $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$ The formula used for the binomial test with a mid-p correction: $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right) -\\text{bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$ The number of tests is the number of categories ( \\(k\\) ). The adjusted p-value is then determined using a Bonferroni correction: $$sig._{adj} = \\begin{cases} sig. \\times k & \\text{ if } sig. \\times k \\leq = 1 \\\\ 1 & \\text{ if } sig. \\times k > 1 \\end{cases}$$ *Symbols used* * \\(F_{1,2}\\), the observed count of cases that scored category 1 on the first variable, and another category on the second. * \\(F_{2,1}\\), the observed count of cases that scored another category on the first variable, and category 1 on the second. * \\(k\\), the number of categories * \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution. * \\(\\text{Bin}\\left(\\dots\\right)\\), the cumulative distribution function for the binomial distribution. * \\(\\text{bin}\\left(\\dots\\right)\\), the probability mass function for the binomial distribution. References ---------- McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' #create the cross table ct = tab_cross(field1, field2, categories, categories, totals="include") #basic counts k = ct.shape[0]-1 n = ct.iloc[k, k] res = pd.DataFrame() for i in range(0, k): a = ct.iloc[i,i] b = ct.iloc[i,k] - a c = ct.iloc[k, i] - a d = n - a - b - c res.at[i, 0] = ct.index[i] res.at[i,1] = n if exact: minCount = min(b,c) pVal = 2*binom.cdf(minCount, n, 0.5) if cc: pVal = pVal - binom.pmf(minCount, n,0.5) stat = None df = None else: if cc: stat = (abs(b - c)-1)**2 / (b+c) else: stat = (b - c)**2 / (b+c) df = 1 pVal = chi2.sf(stat, df) res.at[i,2] = stat res.at[i,3] = df res.at[i,4] = pVal res.at[i,5] = res.loc[i,4] * k if res.loc[i,5] > 1: res.loc[i,5] = 1 res.columns = ["category", "n", "statistic", "df", "p-value", "adj. p-value"] return res