Module stikpetP.tests.test_mcnemar_bowker
Expand source code
import pandas as pd
from scipy.stats import chi2
from ..other.table_cross import tab_cross
def ts_mcnemar_bowker(field1, field2, categories=None, cc=False):
'''
(McNemar-)Bowker Test
-------------------
The Bowker test (Bowker, 1948) is an extension of the McNemar (1947) test, which was only for 2x2 tables.
It tests if there is a change in symmetric opinion changes. It assumes there is no change, and if the p-value is below a pre-set threshold (usually 0.05) this assumption is rejected.
Parameters
----------
field1 : list or pandas series
the first categorical field
field2 : list or pandas series
the first categorical field
categories : list or dictionary, optional
order and/or selection for categories of field1 and field2
cc : boolean, optional
use of a continuity correction (default is False)
Returns
-------
* *n*, the sample size
* *statistic*, the chi-squared value
* *df*, the degrees of freedom used in the test
* *p-value*, the significance (p-value)
Notes
-----
The formula used is (Bowker, 1948, p. 573):
$$\\chi_{B}^2 = \\sum_{i=1}^{r-1} \\sum_{j=i+1}^{c} \\frac{\\left(F_{i,j}-F_{j,i}\\right)^2}{F_{i,j}+F_{j,i}} $$
$$df = \\frac{r\\times \\left(r - 1\\right)}{2}\\frac{c\\times \\left(c - 1\\right)}{2}$$
$$sig. = 1 - \\chi^2\\left(\\chi_B^2, df\\right)$$
*Symbols used:*
* \\(r\\), is the number of rows (categories in the first variable)
* \\(c\\), is the number of columns (categories in the second variable)
* \\(n\\), is the total number of scores
* \\(F_{i,j}\\), is the frequency (count) of scores equal to the i-th category in the first variable, and the j-th category in the second.
* \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution.
References
----------
Bowker, A. H. (1948). A test for symmetry in contingency tables. *Journal of the American Statistical Association, 43*(244), 572–574. doi:10.2307/2280710
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
#create the cross table
ct = tab_cross(field1, field2, categories, categories, totals="include")
#basic counts
k = ct.shape[0]-1
n = ct.iloc[k, k]
chiVal = 0
for i in range(0, k-1):
for j in range(i+1, k):
if cc:
chiVal = chiVal + (abs(ct.iloc[i, j] - ct.iloc[j, i]) - 1)**2 / (ct.iloc[i, j] + ct.iloc[j, i])
else:
chiVal = chiVal + (ct.iloc[i, j] - ct.iloc[j, i])**2 / (ct.iloc[i, j] + ct.iloc[j, i])
df = k * (k - 1) / 2
pvalue = chi2.sf(chiVal, df)
#results
colNames = ["n", "statistic", "df", "p-value"]
results = pd.DataFrame([[n, chiVal, df, pvalue]], columns=colNames)
return results
Functions
def ts_mcnemar_bowker(field1, field2, categories=None, cc=False)-
(McNemar-)Bowker Test
The Bowker test (Bowker, 1948) is an extension of the McNemar (1947) test, which was only for 2x2 tables.
It tests if there is a change in symmetric opinion changes. It assumes there is no change, and if the p-value is below a pre-set threshold (usually 0.05) this assumption is rejected.
Parameters
field1:listorpandas series- the first categorical field
field2:listorpandas series- the first categorical field
categories:listordictionary, optional- order and/or selection for categories of field1 and field2
cc:boolean, optional- use of a continuity correction (default is False)
Returns
- n, the sample size
- statistic, the chi-squared value
- df, the degrees of freedom used in the test
- p-value, the significance (p-value)
Notes
The formula used is (Bowker, 1948, p. 573): \chi_{B}^2 = \sum_{i=1}^{r-1} \sum_{j=i+1}^{c} \frac{\left(F_{i,j}-F_{j,i}\right)^2}{F_{i,j}+F_{j,i}} df = \frac{r\times \left(r - 1\right)}{2}\frac{c\times \left(c - 1\right)}{2} sig. = 1 - \chi^2\left(\chi_B^2, df\right)
Symbols used:
- r, is the number of rows (categories in the first variable)
- c, is the number of columns (categories in the second variable)
- n, is the total number of scores
- F_{i,j}, is the frequency (count) of scores equal to the i-th category in the first variable, and the j-th category in the second.
- \chi^2\left(\dots\right), the cumulative distribution function for the chi-square distribution.
References
Bowker, A. H. (1948). A test for symmetry in contingency tables. Journal of the American Statistical Association, 43(244), 572–574. doi:10.2307/2280710
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. doi:10.1007/BF02295996
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_mcnemar_bowker(field1, field2, categories=None, cc=False): ''' (McNemar-)Bowker Test ------------------- The Bowker test (Bowker, 1948) is an extension of the McNemar (1947) test, which was only for 2x2 tables. It tests if there is a change in symmetric opinion changes. It assumes there is no change, and if the p-value is below a pre-set threshold (usually 0.05) this assumption is rejected. Parameters ---------- field1 : list or pandas series the first categorical field field2 : list or pandas series the first categorical field categories : list or dictionary, optional order and/or selection for categories of field1 and field2 cc : boolean, optional use of a continuity correction (default is False) Returns ------- * *n*, the sample size * *statistic*, the chi-squared value * *df*, the degrees of freedom used in the test * *p-value*, the significance (p-value) Notes ----- The formula used is (Bowker, 1948, p. 573): $$\\chi_{B}^2 = \\sum_{i=1}^{r-1} \\sum_{j=i+1}^{c} \\frac{\\left(F_{i,j}-F_{j,i}\\right)^2}{F_{i,j}+F_{j,i}} $$ $$df = \\frac{r\\times \\left(r - 1\\right)}{2}\\frac{c\\times \\left(c - 1\\right)}{2}$$ $$sig. = 1 - \\chi^2\\left(\\chi_B^2, df\\right)$$ *Symbols used:* * \\(r\\), is the number of rows (categories in the first variable) * \\(c\\), is the number of columns (categories in the second variable) * \\(n\\), is the total number of scores * \\(F_{i,j}\\), is the frequency (count) of scores equal to the i-th category in the first variable, and the j-th category in the second. * \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution. References ---------- Bowker, A. H. (1948). A test for symmetry in contingency tables. *Journal of the American Statistical Association, 43*(244), 572–574. doi:10.2307/2280710 McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' #create the cross table ct = tab_cross(field1, field2, categories, categories, totals="include") #basic counts k = ct.shape[0]-1 n = ct.iloc[k, k] chiVal = 0 for i in range(0, k-1): for j in range(i+1, k): if cc: chiVal = chiVal + (abs(ct.iloc[i, j] - ct.iloc[j, i]) - 1)**2 / (ct.iloc[i, j] + ct.iloc[j, i]) else: chiVal = chiVal + (ct.iloc[i, j] - ct.iloc[j, i])**2 / (ct.iloc[i, j] + ct.iloc[j, i]) df = k * (k - 1) / 2 pvalue = chi2.sf(chiVal, df) #results colNames = ["n", "statistic", "df", "p-value"] results = pd.DataFrame([[n, chiVal, df, pvalue]], columns=colNames) return results