Module stikpetP.tests.test_cochran_q

Expand source code
import pandas as pd
from scipy.stats import chi2

def ts_cochran_q(data, success=None):
    '''
    Cochran q Test
    --------------
    A test for multiple binairy variables. The null hypothesis is that the proportion of successes is the same in all groups.
    
    If the p-value (sig.) is below a certain threshold (usually .05) the assumption is rejected and at least one category has a significant different number of successes than at least one other group, in the population.
    
    If the test is significant (below the threshold) a post-hoc Dunn test could be used, or pairwise McNemar-Bowker.
    
    Parameters
    ----------
    data : dataframe
        dataframe with a column for each category
    success : object, optional
        the value that represents a 'success'. If None (default) the first value found will be used as success.
        
    Returns
    -------
    res : dataframe
        test results with the following columns
    
    * *n*, the sample size
    * *statistic*, the test statistic (chi-square value)
    * *df*, the degrees of freedom
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used (Cochran, 1950, p. 259):
    $$Q = \\frac{k\\times\\left(k-1\\right)\\times\\sum_{j=1}^k\\left(C_j-\\bar{C}\\right)^2}{k\\times n_s - \\sum_{i=1}^n R_i^2}$$
    $$df = k - 1$$
    $$sig. = 1 - \\chi^2\\left(Q, df\\right)$$
    
    *Symbols used*
    
    * \\(C_j\\), the number of successes in category j
    * \\(k\\), the number of categories (factors)
    * \\(R_i\\), the number of succeses in case i
    * \\(n_s\\), the total number of successes
    * \\(n\\), the number of cases
    * \\(\\chi^2\\left(\\dots\\right)\\), the chi-square cumulative distribution function
    
    References
    ----------
    Cochran, W. G. (1950). The comparison of percentages in matched samples. *Biometrika, 37*(3/4), 256–266. doi:10.2307/2332378

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data = data.reset_index(drop=True)
    
    n = len(data)
    k = len(data.columns)
    
    if success is None:
        suc = data.iloc[0,0]
    else:
        suc = success
    
    isSuc = data == suc
    cj = isSuc.sum()
    ri = isSuc.sum(axis=1)
    ns = cj.sum()
    cm = ns/k
    
    q = k*(k - 1)*((cj - cm)**2).sum() / (k*ns - (ri**2).sum())
    df = k - 1
    pVal = chi2.sf(q, df)
    
    res = pd.DataFrame([[n, q, df, pVal]])
    res.columns = ["n", "statistic", "df", "p-value"]
    
    return res

Functions

def ts_cochran_q(data, success=None)

Cochran Q Test

A test for multiple binairy variables. The null hypothesis is that the proportion of successes is the same in all groups.

If the p-value (sig.) is below a certain threshold (usually .05) the assumption is rejected and at least one category has a significant different number of successes than at least one other group, in the population.

If the test is significant (below the threshold) a post-hoc Dunn test could be used, or pairwise McNemar-Bowker.

Parameters

data : dataframe
dataframe with a column for each category
success : object, optional
the value that represents a 'success'. If None (default) the first value found will be used as success.

Returns

res : dataframe
test results with the following columns
  • n, the sample size
  • statistic, the test statistic (chi-square value)
  • df, the degrees of freedom
  • p-value, the p-value (significance)

Notes

The formula used (Cochran, 1950, p. 259): Q = \frac{k\times\left(k-1\right)\times\sum_{j=1}^k\left(C_j-\bar{C}\right)^2}{k\times n_s - \sum_{i=1}^n R_i^2} df = k - 1 sig. = 1 - \chi^2\left(Q, df\right)

Symbols used

  • C_j, the number of successes in category j
  • k, the number of categories (factors)
  • R_i, the number of succeses in case i
  • n_s, the total number of successes
  • n, the number of cases
  • \chi^2\left(\dots\right), the chi-square cumulative distribution function

References

Cochran, W. G. (1950). The comparison of percentages in matched samples. Biometrika, 37(3/4), 256–266. doi:10.2307/2332378

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def ts_cochran_q(data, success=None):
    '''
    Cochran q Test
    --------------
    A test for multiple binairy variables. The null hypothesis is that the proportion of successes is the same in all groups.
    
    If the p-value (sig.) is below a certain threshold (usually .05) the assumption is rejected and at least one category has a significant different number of successes than at least one other group, in the population.
    
    If the test is significant (below the threshold) a post-hoc Dunn test could be used, or pairwise McNemar-Bowker.
    
    Parameters
    ----------
    data : dataframe
        dataframe with a column for each category
    success : object, optional
        the value that represents a 'success'. If None (default) the first value found will be used as success.
        
    Returns
    -------
    res : dataframe
        test results with the following columns
    
    * *n*, the sample size
    * *statistic*, the test statistic (chi-square value)
    * *df*, the degrees of freedom
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used (Cochran, 1950, p. 259):
    $$Q = \\frac{k\\times\\left(k-1\\right)\\times\\sum_{j=1}^k\\left(C_j-\\bar{C}\\right)^2}{k\\times n_s - \\sum_{i=1}^n R_i^2}$$
    $$df = k - 1$$
    $$sig. = 1 - \\chi^2\\left(Q, df\\right)$$
    
    *Symbols used*
    
    * \\(C_j\\), the number of successes in category j
    * \\(k\\), the number of categories (factors)
    * \\(R_i\\), the number of succeses in case i
    * \\(n_s\\), the total number of successes
    * \\(n\\), the number of cases
    * \\(\\chi^2\\left(\\dots\\right)\\), the chi-square cumulative distribution function
    
    References
    ----------
    Cochran, W. G. (1950). The comparison of percentages in matched samples. *Biometrika, 37*(3/4), 256–266. doi:10.2307/2332378

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data = data.reset_index(drop=True)
    
    n = len(data)
    k = len(data.columns)
    
    if success is None:
        suc = data.iloc[0,0]
    else:
        suc = success
    
    isSuc = data == suc
    cj = isSuc.sum()
    ri = isSuc.sum(axis=1)
    ns = cj.sum()
    cm = ns/k
    
    q = k*(k - 1)*((cj - cm)**2).sum() / (k*ns - (ri**2).sum())
    df = k - 1
    pVal = chi2.sf(q, df)
    
    res = pd.DataFrame([[n, q, df, pVal]])
    res.columns = ["n", "statistic", "df", "p-value"]
    
    return res