Module stikpetP.tests.test_cochran_q
Expand source code
import pandas as pd
from scipy.stats import chi2
def ts_cochran_q(data, success=None):
'''
Cochran q Test
--------------
A test for multiple binairy variables. The null hypothesis is that the proportion of successes is the same in all groups.
If the p-value (sig.) is below a certain threshold (usually .05) the assumption is rejected and at least one category has a significant different number of successes than at least one other group, in the population.
If the test is significant (below the threshold) a post-hoc Dunn test could be used, or pairwise McNemar-Bowker.
Parameters
----------
data : dataframe
dataframe with a column for each category
success : object, optional
the value that represents a 'success'. If None (default) the first value found will be used as success.
Returns
-------
res : dataframe
test results with the following columns
* *n*, the sample size
* *statistic*, the test statistic (chi-square value)
* *df*, the degrees of freedom
* *p-value*, the p-value (significance)
Notes
-----
The formula used (Cochran, 1950, p. 259):
$$Q = \\frac{k\\times\\left(k-1\\right)\\times\\sum_{j=1}^k\\left(C_j-\\bar{C}\\right)^2}{k\\times n_s - \\sum_{i=1}^n R_i^2}$$
$$df = k - 1$$
$$sig. = 1 - \\chi^2\\left(Q, df\\right)$$
*Symbols used*
* \\(C_j\\), the number of successes in category j
* \\(k\\), the number of categories (factors)
* \\(R_i\\), the number of succeses in case i
* \\(n_s\\), the total number of successes
* \\(n\\), the number of cases
* \\(\\chi^2\\left(\\dots\\right)\\), the chi-square cumulative distribution function
References
----------
Cochran, W. G. (1950). The comparison of percentages in matched samples. *Biometrika, 37*(3/4), 256–266. doi:10.2307/2332378
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
#Remove rows with missing values and reset index
data = data.dropna()
data = data.reset_index(drop=True)
n = len(data)
k = len(data.columns)
if success is None:
suc = data.iloc[0,0]
else:
suc = success
isSuc = data == suc
cj = isSuc.sum()
ri = isSuc.sum(axis=1)
ns = cj.sum()
cm = ns/k
q = k*(k - 1)*((cj - cm)**2).sum() / (k*ns - (ri**2).sum())
df = k - 1
pVal = chi2.sf(q, df)
res = pd.DataFrame([[n, q, df, pVal]])
res.columns = ["n", "statistic", "df", "p-value"]
return res
Functions
def ts_cochran_q(data, success=None)-
Cochran Q Test
A test for multiple binairy variables. The null hypothesis is that the proportion of successes is the same in all groups.
If the p-value (sig.) is below a certain threshold (usually .05) the assumption is rejected and at least one category has a significant different number of successes than at least one other group, in the population.
If the test is significant (below the threshold) a post-hoc Dunn test could be used, or pairwise McNemar-Bowker.
Parameters
data:dataframe- dataframe with a column for each category
success:object, optional- the value that represents a 'success'. If None (default) the first value found will be used as success.
Returns
res:dataframe- test results with the following columns
- n, the sample size
- statistic, the test statistic (chi-square value)
- df, the degrees of freedom
- p-value, the p-value (significance)
Notes
The formula used (Cochran, 1950, p. 259): Q = \frac{k\times\left(k-1\right)\times\sum_{j=1}^k\left(C_j-\bar{C}\right)^2}{k\times n_s - \sum_{i=1}^n R_i^2} df = k - 1 sig. = 1 - \chi^2\left(Q, df\right)
Symbols used
- C_j, the number of successes in category j
- k, the number of categories (factors)
- R_i, the number of succeses in case i
- n_s, the total number of successes
- n, the number of cases
- \chi^2\left(\dots\right), the chi-square cumulative distribution function
References
Cochran, W. G. (1950). The comparison of percentages in matched samples. Biometrika, 37(3/4), 256–266. doi:10.2307/2332378
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_cochran_q(data, success=None): ''' Cochran q Test -------------- A test for multiple binairy variables. The null hypothesis is that the proportion of successes is the same in all groups. If the p-value (sig.) is below a certain threshold (usually .05) the assumption is rejected and at least one category has a significant different number of successes than at least one other group, in the population. If the test is significant (below the threshold) a post-hoc Dunn test could be used, or pairwise McNemar-Bowker. Parameters ---------- data : dataframe dataframe with a column for each category success : object, optional the value that represents a 'success'. If None (default) the first value found will be used as success. Returns ------- res : dataframe test results with the following columns * *n*, the sample size * *statistic*, the test statistic (chi-square value) * *df*, the degrees of freedom * *p-value*, the p-value (significance) Notes ----- The formula used (Cochran, 1950, p. 259): $$Q = \\frac{k\\times\\left(k-1\\right)\\times\\sum_{j=1}^k\\left(C_j-\\bar{C}\\right)^2}{k\\times n_s - \\sum_{i=1}^n R_i^2}$$ $$df = k - 1$$ $$sig. = 1 - \\chi^2\\left(Q, df\\right)$$ *Symbols used* * \\(C_j\\), the number of successes in category j * \\(k\\), the number of categories (factors) * \\(R_i\\), the number of succeses in case i * \\(n_s\\), the total number of successes * \\(n\\), the number of cases * \\(\\chi^2\\left(\\dots\\right)\\), the chi-square cumulative distribution function References ---------- Cochran, W. G. (1950). The comparison of percentages in matched samples. *Biometrika, 37*(3/4), 256–266. doi:10.2307/2332378 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' #Remove rows with missing values and reset index data = data.dropna() data = data.reset_index(drop=True) n = len(data) k = len(data.columns) if success is None: suc = data.iloc[0,0] else: suc = success isSuc = data == suc cj = isSuc.sum() ri = isSuc.sum(axis=1) ns = cj.sum() cm = ns/k q = k*(k - 1)*((cj - cm)**2).sum() / (k*ns - (ri**2).sum()) df = k - 1 pVal = chi2.sf(q, df) res = pd.DataFrame([[n, q, df, pVal]]) res.columns = ["n", "statistic", "df", "p-value"] return res