Module stikpetP.other.poho_mcnemar_co
Expand source code
import pandas as pd
from scipy.stats import chi2
from scipy.stats import binom
from ..other.table_cross import tab_cross
def ph_mcnemar_co(field1, field2, categories=None, exact=False, cc=False):
'''
Post-Hoc McNemar Test - Collapsed
---------------------------------
After a (McNemar-)Bowker test a post-hoc test can potentially locate where the changes occured. This can be done use a McNemar test, which is the Bowker test but for 2x2 tables.
There are two variations, one is to simply compare each possible pair of categories (pairwise comparison), or compare each category with all other categories (collapsed comparison). This function is for the collapsed version, see **ph_mcnemar_pw()** for the pairwise version.
Instead of using the McNemar test it is also possible to use the binomial test, which will be used if exact is set to True.
Parameters
----------
field1 : list or pandas series
the first categorical field
field2 : list or pandas series
the first categorical field
categories : list or dictionary, optional
order and/or selection for categories of field1 and field2
exact : boolean, optional
use of exact binomial distribution (default is False)
cc : boolean, optional
use of a continuity correction (default is False)
Returns
-------
A dataframe with:
* *category*, the specific category compared to all other categories
* *n*, the sample size
* *statistic*, the chi-squared value (if applicable)
* *df*, the degrees of freedom used in the test (if applicable)
* *p-value*, the significance (p-value)
* *adj. p-value*, the Bonferroni adjusted p-value
Notes
-----
The formula used is (McNemar, 1947, p. 156):
$$\\chi_{M}^2 = \\frac{\\left(F_{1,2} - F_{2,1}\\right)^2}{F_{1,2} + F_{2,1}}$$
$$df = 1$$
$$sig. = 1 - \\chi^2\\left(\\chi_M^2, df\\right)$$
If a continuity correction is applied the formula changes to:
$$\\chi_{M*}^2 = \\frac{\\left(\\left|F_{1,2} - F_{2,1}\\right| - 1\\right)^2}{F_{1,2} + F_{2,1}}$$
The formula used for the binomial test is:
$$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$
The formula used for the binomial test with a mid-p correction:
$$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right) -\\text{bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$
The number of tests is the number of categories ( \\(k\\) ). The adjusted p-value is then determined using a Bonferroni correction:
$$sig._{adj} = \\begin{cases} sig. \\times k & \\text{ if } sig. \\times k \\leq = 1 \\\\ 1 & \\text{ if } sig. \\times k > 1 \\end{cases}$$
*Symbols used*
* \\(F_{1,2}\\), the observed count of cases that scored category 1 on the first variable, and another category on the second.
* \\(F_{2,1}\\), the observed count of cases that scored another category on the first variable, and category 1 on the second.
* \\(k\\), the number of categories
* \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution.
* \\(\\text{Bin}\\left(\\dots\\right)\\), the cumulative distribution function for the binomial distribution.
* \\(\\text{bin}\\left(\\dots\\right)\\), the probability mass function for the binomial distribution.
References
----------
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
#create the cross table
ct = tab_cross(field1, field2, categories, categories, totals="include")
#basic counts
k = ct.shape[0]-1
n = ct.iloc[k, k]
res = pd.DataFrame()
for i in range(0, k):
a = ct.iloc[i,i]
b = ct.iloc[i,k] - a
c = ct.iloc[k, i] - a
d = n - a - b - c
res.at[i, 0] = ct.index[i]
res.at[i,1] = n
if exact:
minCount = min(b,c)
pVal = 2*binom.cdf(minCount, n, 0.5)
if cc:
pVal = pVal - binom.pmf(minCount, n,0.5)
stat = None
df = None
else:
if cc:
stat = (abs(b - c)-1)**2 / (b+c)
else:
stat = (b - c)**2 / (b+c)
df = 1
pVal = chi2.sf(stat, df)
res.at[i,2] = stat
res.at[i,3] = df
res.at[i,4] = pVal
res.at[i,5] = res.loc[i,4] * k
if res.loc[i,5] > 1:
res.loc[i,5] = 1
res.columns = ["category", "n", "statistic", "df", "p-value", "adj. p-value"]
return res
Functions
def ph_mcnemar_co(field1, field2, categories=None, exact=False, cc=False)-
Post-Hoc McNemar Test - Collapsed
After a (McNemar-)Bowker test a post-hoc test can potentially locate where the changes occured. This can be done use a McNemar test, which is the Bowker test but for 2x2 tables.
There are two variations, one is to simply compare each possible pair of categories (pairwise comparison), or compare each category with all other categories (collapsed comparison). This function is for the collapsed version, see ph_mcnemar_pw() for the pairwise version.
Instead of using the McNemar test it is also possible to use the binomial test, which will be used if exact is set to True.
Parameters
field1:listorpandas series- the first categorical field
field2:listorpandas series- the first categorical field
categories:listordictionary, optional- order and/or selection for categories of field1 and field2
exact:boolean, optional- use of exact binomial distribution (default is False)
cc:boolean, optional- use of a continuity correction (default is False)
Returns
A dataframe with:
- category, the specific category compared to all other categories
- n, the sample size
- statistic, the chi-squared value (if applicable)
- df, the degrees of freedom used in the test (if applicable)
- p-value, the significance (p-value)
- adj. p-value, the Bonferroni adjusted p-value
Notes
The formula used is (McNemar, 1947, p. 156): \chi_{M}^2 = \frac{\left(F_{1,2} - F_{2,1}\right)^2}{F_{1,2} + F_{2,1}} df = 1 sig. = 1 - \chi^2\left(\chi_M^2, df\right)
If a continuity correction is applied the formula changes to: \chi_{M*}^2 = \frac{\left(\left|F_{1,2} - F_{2,1}\right| - 1\right)^2}{F_{1,2} + F_{2,1}}
The formula used for the binomial test is: sig. = 2\times\text{Bin}\left(F_{1,2} + F_{2,1}, \min\left(F_{1,2}, F_{2,1}\right), 0.5\right)
The formula used for the binomial test with a mid-p correction: sig. = 2\times\text{Bin}\left(F_{1,2} + F_{2,1}, \min\left(F_{1,2}, F_{2,1}\right), 0.5\right) -\text{bin}\left(F_{1,2} + F_{2,1}, \min\left(F_{1,2}, F_{2,1}\right), 0.5\right)
The number of tests is the number of categories ( k ). The adjusted p-value is then determined using a Bonferroni correction: sig._{adj} = \begin{cases} sig. \times k & \text{ if } sig. \times k \leq = 1 \\ 1 & \text{ if } sig. \times k > 1 \end{cases}
Symbols used
- F_{1,2}, the observed count of cases that scored category 1 on the first variable, and another category on the second.
- F_{2,1}, the observed count of cases that scored another category on the first variable, and category 1 on the second.
- k, the number of categories
- \chi^2\left(\dots\right), the cumulative distribution function for the chi-square distribution.
- \text{Bin}\left(\dots\right), the cumulative distribution function for the binomial distribution.
- \text{bin}\left(\dots\right), the probability mass function for the binomial distribution.
References
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. doi:10.1007/BF02295996
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ph_mcnemar_co(field1, field2, categories=None, exact=False, cc=False): ''' Post-Hoc McNemar Test - Collapsed --------------------------------- After a (McNemar-)Bowker test a post-hoc test can potentially locate where the changes occured. This can be done use a McNemar test, which is the Bowker test but for 2x2 tables. There are two variations, one is to simply compare each possible pair of categories (pairwise comparison), or compare each category with all other categories (collapsed comparison). This function is for the collapsed version, see **ph_mcnemar_pw()** for the pairwise version. Instead of using the McNemar test it is also possible to use the binomial test, which will be used if exact is set to True. Parameters ---------- field1 : list or pandas series the first categorical field field2 : list or pandas series the first categorical field categories : list or dictionary, optional order and/or selection for categories of field1 and field2 exact : boolean, optional use of exact binomial distribution (default is False) cc : boolean, optional use of a continuity correction (default is False) Returns ------- A dataframe with: * *category*, the specific category compared to all other categories * *n*, the sample size * *statistic*, the chi-squared value (if applicable) * *df*, the degrees of freedom used in the test (if applicable) * *p-value*, the significance (p-value) * *adj. p-value*, the Bonferroni adjusted p-value Notes ----- The formula used is (McNemar, 1947, p. 156): $$\\chi_{M}^2 = \\frac{\\left(F_{1,2} - F_{2,1}\\right)^2}{F_{1,2} + F_{2,1}}$$ $$df = 1$$ $$sig. = 1 - \\chi^2\\left(\\chi_M^2, df\\right)$$ If a continuity correction is applied the formula changes to: $$\\chi_{M*}^2 = \\frac{\\left(\\left|F_{1,2} - F_{2,1}\\right| - 1\\right)^2}{F_{1,2} + F_{2,1}}$$ The formula used for the binomial test is: $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$ The formula used for the binomial test with a mid-p correction: $$sig. = 2\\times\\text{Bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right) -\\text{bin}\\left(F_{1,2} + F_{2,1}, \\min\\left(F_{1,2}, F_{2,1}\\right), 0.5\\right)$$ The number of tests is the number of categories ( \\(k\\) ). The adjusted p-value is then determined using a Bonferroni correction: $$sig._{adj} = \\begin{cases} sig. \\times k & \\text{ if } sig. \\times k \\leq = 1 \\\\ 1 & \\text{ if } sig. \\times k > 1 \\end{cases}$$ *Symbols used* * \\(F_{1,2}\\), the observed count of cases that scored category 1 on the first variable, and another category on the second. * \\(F_{2,1}\\), the observed count of cases that scored another category on the first variable, and category 1 on the second. * \\(k\\), the number of categories * \\(\\chi^2\\left(\\dots\\right)\\), the cumulative distribution function for the chi-square distribution. * \\(\\text{Bin}\\left(\\dots\\right)\\), the cumulative distribution function for the binomial distribution. * \\(\\text{bin}\\left(\\dots\\right)\\), the probability mass function for the binomial distribution. References ---------- McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. *Psychometrika, 12*(2), 153–157. doi:10.1007/BF02295996 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' #create the cross table ct = tab_cross(field1, field2, categories, categories, totals="include") #basic counts k = ct.shape[0]-1 n = ct.iloc[k, k] res = pd.DataFrame() for i in range(0, k): a = ct.iloc[i,i] b = ct.iloc[i,k] - a c = ct.iloc[k, i] - a d = n - a - b - c res.at[i, 0] = ct.index[i] res.at[i,1] = n if exact: minCount = min(b,c) pVal = 2*binom.cdf(minCount, n, 0.5) if cc: pVal = pVal - binom.pmf(minCount, n,0.5) stat = None df = None else: if cc: stat = (abs(b - c)-1)**2 / (b+c) else: stat = (b - c)**2 / (b+c) df = 1 pVal = chi2.sf(stat, df) res.at[i,2] = stat res.at[i,3] = df res.at[i,4] = pVal res.at[i,5] = res.loc[i,4] * k if res.loc[i,5] > 1: res.loc[i,5] = 1 res.columns = ["category", "n", "statistic", "df", "p-value", "adj. p-value"] return res