Module stikpetP.tests.test_mcnemar_bowker

Expand source code
import pandas as pd
from scipy.stats import chi2
from ..other.table_cross import tab_cross

def ts_mcnemar_bowker(field1, field2, categories=None, cc=False):
    '''
    (McNemar-)Bowker Test
    -------------------
    The Bowker test (Bowker, 1948) is an extension of the McNemar (1947) test, which was only for 2x2 tables.
    
    It tests if there is a change in symmetric opinion changes. It assumes there is no change, and if the p-value is below a pre-set threshold (usually 0.05) this assumption is rejected.
    
    Parameters
    ----------
    field1 : list or pandas series
        the first categorical field
    field2 : list or pandas series
        the first categorical field
    categories : list or dictionary, optional
        order and/or selection for categories of field1 and field2
    cc : boolean, optional
        use of a continuity correction (default is False)
    
    Returns
    -------
    * *n*, the sample size
    * *statistic*, the chi-squared value
    * *df*, the degrees of freedom used in the test
    * *p-value*, the significance (p-value)
    
    Notes
    -----
    The formula used is (Bowker, 1948, p. 573):
    $$\\chi_{B}^2 = \\sum_{i=1}^{r-1} \\sum_{j=i+1}^{c} \\frac{\\left(F_{i,j}-F_{j,i}\\right)^2}{F_{i,j}+F_{j,i}} $$
    $$df = \\frac{r\\times \\left(r - 1\\right)}{2}\\frac{c\\times \\left(c - 1\\right)}{2}$$
    $$sig. = 1 - \\chi^2\\left(\\chi_B^2, df\\right)$$
    
    *Symbols used:*
    
    * \\(r\\), is the number of rows (categories in the first variable)
    * \\(c\\), is the number of columns (categories in the second variable)
    * \\(n\\), is the total number of scores
    * \\(F_{i,j}\\), is the frequency (count) of scores equal to the i-th category in the first variable, and the j-th category in the second.
    * \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution.
    
    References
    ----------
    Bowker, A. H. (1948). A test for symmetry in contingency tables. *Journal of the American Statistical Association, 43*(244), 572–574. doi:10.2307/2280710
    
    McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    #create the cross table
    ct = tab_cross(field1, field2, categories, categories, totals="include")    
    
    #basic counts
    k = ct.shape[0]-1
    n = ct.iloc[k, k]
    
    chiVal = 0
    for i in range(0, k-1):
        for j in range(i+1, k):
            if cc:
                chiVal = chiVal + (abs(ct.iloc[i, j] - ct.iloc[j, i]) - 1)**2 / (ct.iloc[i, j] + ct.iloc[j, i])
            else:
                chiVal = chiVal + (ct.iloc[i, j] - ct.iloc[j, i])**2 / (ct.iloc[i, j] + ct.iloc[j, i])
    
    df = k * (k - 1) / 2
    pvalue = chi2.sf(chiVal, df)
    
    #results
    colNames = ["n", "statistic", "df", "p-value"]
    results = pd.DataFrame([[n, chiVal, df, pvalue]], columns=colNames)
    
    return results

Functions

def ts_mcnemar_bowker(field1, field2, categories=None, cc=False)

(McNemar-)Bowker Test

The Bowker test (Bowker, 1948) is an extension of the McNemar (1947) test, which was only for 2x2 tables.

It tests if there is a change in symmetric opinion changes. It assumes there is no change, and if the p-value is below a pre-set threshold (usually 0.05) this assumption is rejected.

Parameters

field1 : list or pandas series
the first categorical field
field2 : list or pandas series
the first categorical field
categories : list or dictionary, optional
order and/or selection for categories of field1 and field2
cc : boolean, optional
use of a continuity correction (default is False)

Returns

  • n, the sample size
  • statistic, the chi-squared value
  • df, the degrees of freedom used in the test
  • p-value, the significance (p-value)

Notes

The formula used is (Bowker, 1948, p. 573): \chi_{B}^2 = \sum_{i=1}^{r-1} \sum_{j=i+1}^{c} \frac{\left(F_{i,j}-F_{j,i}\right)^2}{F_{i,j}+F_{j,i}} df = \frac{r\times \left(r - 1\right)}{2}\frac{c\times \left(c - 1\right)}{2} sig. = 1 - \chi^2\left(\chi_B^2, df\right)

Symbols used:

  • r, is the number of rows (categories in the first variable)
  • c, is the number of columns (categories in the second variable)
  • n, is the total number of scores
  • F_{i,j}, is the frequency (count) of scores equal to the i-th category in the first variable, and the j-th category in the second.
  • \chi^2\left(\dots\right), the cumulative distribution function for the chi-square distribution.

References

Bowker, A. H. (1948). A test for symmetry in contingency tables. Journal of the American Statistical Association, 43(244), 572–574. doi:10.2307/2280710

McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. doi:10.1007/BF02295996

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def ts_mcnemar_bowker(field1, field2, categories=None, cc=False):
    '''
    (McNemar-)Bowker Test
    -------------------
    The Bowker test (Bowker, 1948) is an extension of the McNemar (1947) test, which was only for 2x2 tables.
    
    It tests if there is a change in symmetric opinion changes. It assumes there is no change, and if the p-value is below a pre-set threshold (usually 0.05) this assumption is rejected.
    
    Parameters
    ----------
    field1 : list or pandas series
        the first categorical field
    field2 : list or pandas series
        the first categorical field
    categories : list or dictionary, optional
        order and/or selection for categories of field1 and field2
    cc : boolean, optional
        use of a continuity correction (default is False)
    
    Returns
    -------
    * *n*, the sample size
    * *statistic*, the chi-squared value
    * *df*, the degrees of freedom used in the test
    * *p-value*, the significance (p-value)
    
    Notes
    -----
    The formula used is (Bowker, 1948, p. 573):
    $$\\chi_{B}^2 = \\sum_{i=1}^{r-1} \\sum_{j=i+1}^{c} \\frac{\\left(F_{i,j}-F_{j,i}\\right)^2}{F_{i,j}+F_{j,i}} $$
    $$df = \\frac{r\\times \\left(r - 1\\right)}{2}\\frac{c\\times \\left(c - 1\\right)}{2}$$
    $$sig. = 1 - \\chi^2\\left(\\chi_B^2, df\\right)$$
    
    *Symbols used:*
    
    * \\(r\\), is the number of rows (categories in the first variable)
    * \\(c\\), is the number of columns (categories in the second variable)
    * \\(n\\), is the total number of scores
    * \\(F_{i,j}\\), is the frequency (count) of scores equal to the i-th category in the first variable, and the j-th category in the second.
    * \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution.
    
    References
    ----------
    Bowker, A. H. (1948). A test for symmetry in contingency tables. *Journal of the American Statistical Association, 43*(244), 572–574. doi:10.2307/2280710
    
    McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    #create the cross table
    ct = tab_cross(field1, field2, categories, categories, totals="include")    
    
    #basic counts
    k = ct.shape[0]-1
    n = ct.iloc[k, k]
    
    chiVal = 0
    for i in range(0, k-1):
        for j in range(i+1, k):
            if cc:
                chiVal = chiVal + (abs(ct.iloc[i, j] - ct.iloc[j, i]) - 1)**2 / (ct.iloc[i, j] + ct.iloc[j, i])
            else:
                chiVal = chiVal + (ct.iloc[i, j] - ct.iloc[j, i])**2 / (ct.iloc[i, j] + ct.iloc[j, i])
    
    df = k * (k - 1) / 2
    pvalue = chi2.sf(chiVal, df)
    
    #results
    colNames = ["n", "statistic", "df", "p-value"]
    results = pd.DataFrame([[n, chiVal, df, pvalue]], columns=colNames)
    
    return results