Module stikpetP.effect_sizes.eff_size_cohen_kappa

Expand source code
import pandas as pd
from statistics import NormalDist
from ..other.table_cross import tab_cross

def es_cohen_kappa(field1, field2, categories=None):
    '''
    Cohen Kappa
    -----------
    An effect size meaure, that measures the how strongly two raters or variables, agree with each other. No agreement would result in a kappa of 0, and full agreement a kappa of 1.
    
    There are quite a few different measures of agreement. Neuendorf (2002, p. 162) refers to Popping (1988) who looked at 39 different measures and concluded that Cohen's kappa is the optimal one.
    
    Parameters
    ----------
    field1 : list or pandas series
        the first categorical field
    field2 : list or pandas series
        the first categorical field
    categories : list or dictionary, optional
        order and/or selection for categories of field1 and field2
        
    Returns
    -------
    A dataframe with:
    * *Kappa*, the kappa value.
    * *n*, the sample size
    * *statistic*, the test statistic (z-value)
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used (Cohen, 1960, p. 44):
    $$\\kappa = \\frac{n\\times P - Q}{n^2 - Q} = \\frac{p_0 - p_c}{1 - p_c}$$
    
    With:
    $$P = \\sum_{i=1}^r F_{i,i}$$
    $$Q = \\sum_{i=1}^r R_{i}\\times C_{i}$$
    $$p_0 = \\frac{P}{n}$$
    $$p_c = \\frac{Q}{n^2}$$
    
    The asymptotic standard errors are calculated using (Fleiss et al., 1969, p. 325):
    $$ASE_0 = \\sqrt{\\frac{SS_0}{n\\times\\left(1 - p_c\\right)^2}}$$
    $$ASE_1 = \\sqrt{\\frac{SS_1}{n\\times\\left(1 - p_c\\right)^4}}$$
    
    With:
    $$SS_0 = \\left(\\sum_{i=1}^r p_{i,.}\\times p_{.,i}\\times\\left(1 - \\left(p_{i,.} + p_{.,i}\\right)\\right)^2\\right) - p_c^2 + \\sum_{i=1}^r \\sum_{j=1, i\\neq j}^c p_{i,.}\\times p_{.,j}\\times\\left(p_{.,i} + p_{j,.}\\right)^2$$
    $$SS_1 = \\left(\\sum_{i=1}^r p_{i,i}\\times\\left(\\left(1 - p_c\\right) - \\left(p_{.,i} + p_{i,.}\\right)\\times\\left(1 - p_0\\right)\\right)^2\\right) - \\left(p_0\\times p_c - 2\\times p_c + p_0\\right)^2 + \\left(1 - p_0\\right)^2 \\times \\sum_{i=1}^r \\sum_{j=1, i\\neq j}^c p_{i,j}\\times\\left(p_{.,i}+p_{j,.}\\right)^2$$
    $$p_{i,j} = \\frac{F_{i,j}}{n}$$
    $$p_{i,.} = \\frac{R_{i}}{n}$$
    $$p_{.,j} = \\frac{C_{j}}{n}$$
    
    Approximate asymptotic standard errors could also be calculated using (Cohen, 1960, pp. 40, 43):
    $$ASE_0 \\approx \\sqrt{\\frac{p_c}{n\\times\\left(1 - p_c\\right)}}$$
    $$ASE_1 \\approx \\sqrt{\\frac{p_0\\times\\left(1-p_0\\right)}{n\\times\\left(1 - p_c\\right)^2}}$$
    
    The p-value (significance) is then calculated using:
    $$z_{\\kappa} = \\frac{\\kappa}{ASE_0}$$
    $$sig. = 2\\times\\left(1 - \\Phi\\left(z_{\\kappa}\\right)\\right)$$
    
    *Symbols used*
    
    * \\(F_{i,j}\\), the observed count in row i and column j.
    * \\(r\\), is the number of rows (categories in the first variable)
    * \\(c\\), is the number of columns (categories in the second variable)
    * \\(n\\), is the total number of scores
    * \\(R_i\\), the row total of row i. \\(R_i = \\sum_{j=1}^c F_{i,j}\\)
    * \\(C_j\\), the column total of column j. \\(C_j = \\sum_{i=1}^r F_{i,j}\\)
    
    References
    ----------
    Cohen, J. (1960). A coefficient of agreement for nominal scales. *Educational and Psychological Measurement, 20*(1), 37–46. doi:10.1177/001316446002000104
    
    Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. *Psychological Bulletin, 72*(5), 323–327. doi:10.1037/h0028106
    
    Neuendorf, K. A. (2002). *The content analysis guidebook*. SAGE Publications.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    #create the cross table
    ct = tab_cross(field1, field2, categories, categories, totals="include")    
    
    #basic counts
    k = ct.shape[0]-1
    n = ct.iloc[k, k]
    
    #STEP 1: Convert to percentages based on grand total
    p = pd.DataFrame()
    for i in range(0, k + 1):
        for j in range(0, k + 1):
            p.at[i, j] = ct.iloc[i, j] / n
    
    #STEP 2: P and Q
    Pcap = 0
    QC = 0
    for i in range(0, k):
        Pcap = Pcap + ct.iloc[i, i]
        QC = QC + ct.iloc[i, k] * ct.iloc[k, i]
    p0 = Pcap / n
    pc = QC / (n**2)
    
    #Cohen kappa
    kappa = (p0 - pc) / (1 - pc)
    
    
    #TEST
    ss0P1 = 0
    ss0P2 = 0
    for i in range(0, k):
        ss0P1 = ss0P1 + p.iloc[i, k] * p.iloc[k, i] * (1 - (p.iloc[k, i] + p.iloc[i, k]))**2
        
        for j in range(0, k):
            if i != j:
                ss0P2 = ss0P2 + p.iloc[i, k] * p.iloc[k, j] * (p.iloc[k, i] + p.iloc[j, k])**2
    
    ss0 = ss0P1 + ss0P2 - pc**2
    ase0 = (ss0 / (n * (1 - pc)**2))**0.5
    
    z = kappa / ase0
    pValue = 2 * (1 - NormalDist().cdf(abs(z))) 
    
    #the results
    colnames = ["Kappa", "n", "statistic", "p-value"]
    results = pd.DataFrame([[kappa, n, z, pValue]], columns=colnames)
    
    return (results)

Functions

def es_cohen_kappa(field1, field2, categories=None)

Cohen Kappa

An effect size meaure, that measures the how strongly two raters or variables, agree with each other. No agreement would result in a kappa of 0, and full agreement a kappa of 1.

There are quite a few different measures of agreement. Neuendorf (2002, p. 162) refers to Popping (1988) who looked at 39 different measures and concluded that Cohen's kappa is the optimal one.

Parameters

field1 : list or pandas series
the first categorical field
field2 : list or pandas series
the first categorical field
categories : list or dictionary, optional
order and/or selection for categories of field1 and field2

Returns

A dataframe with:
 
  • Kappa, the kappa value.
  • n, the sample size
  • statistic, the test statistic (z-value)
  • p-value, the p-value (significance)

Notes

The formula used (Cohen, 1960, p. 44): \kappa = \frac{n\times P - Q}{n^2 - Q} = \frac{p_0 - p_c}{1 - p_c}

With: P = \sum_{i=1}^r F_{i,i} Q = \sum_{i=1}^r R_{i}\times C_{i} p_0 = \frac{P}{n} p_c = \frac{Q}{n^2}

The asymptotic standard errors are calculated using (Fleiss et al., 1969, p. 325): ASE_0 = \sqrt{\frac{SS_0}{n\times\left(1 - p_c\right)^2}} ASE_1 = \sqrt{\frac{SS_1}{n\times\left(1 - p_c\right)^4}}

With: SS_0 = \left(\sum_{i=1}^r p_{i,.}\times p_{.,i}\times\left(1 - \left(p_{i,.} + p_{.,i}\right)\right)^2\right) - p_c^2 + \sum_{i=1}^r \sum_{j=1, i\neq j}^c p_{i,.}\times p_{.,j}\times\left(p_{.,i} + p_{j,.}\right)^2 SS_1 = \left(\sum_{i=1}^r p_{i,i}\times\left(\left(1 - p_c\right) - \left(p_{.,i} + p_{i,.}\right)\times\left(1 - p_0\right)\right)^2\right) - \left(p_0\times p_c - 2\times p_c + p_0\right)^2 + \left(1 - p_0\right)^2 \times \sum_{i=1}^r \sum_{j=1, i\neq j}^c p_{i,j}\times\left(p_{.,i}+p_{j,.}\right)^2 p_{i,j} = \frac{F_{i,j}}{n} p_{i,.} = \frac{R_{i}}{n} p_{.,j} = \frac{C_{j}}{n}

Approximate asymptotic standard errors could also be calculated using (Cohen, 1960, pp. 40, 43): ASE_0 \approx \sqrt{\frac{p_c}{n\times\left(1 - p_c\right)}} ASE_1 \approx \sqrt{\frac{p_0\times\left(1-p_0\right)}{n\times\left(1 - p_c\right)^2}}

The p-value (significance) is then calculated using: z_{\kappa} = \frac{\kappa}{ASE_0} sig. = 2\times\left(1 - \Phi\left(z_{\kappa}\right)\right)

Symbols used

  • F_{i,j}, the observed count in row i and column j.
  • r, is the number of rows (categories in the first variable)
  • c, is the number of columns (categories in the second variable)
  • n, is the total number of scores
  • R_i, the row total of row i. R_i = \sum_{j=1}^c F_{i,j}
  • C_j, the column total of column j. C_j = \sum_{i=1}^r F_{i,j}

References

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. doi:10.1177/001316446002000104

Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5), 323–327. doi:10.1037/h0028106

Neuendorf, K. A. (2002). The content analysis guidebook. SAGE Publications.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def es_cohen_kappa(field1, field2, categories=None):
    '''
    Cohen Kappa
    -----------
    An effect size meaure, that measures the how strongly two raters or variables, agree with each other. No agreement would result in a kappa of 0, and full agreement a kappa of 1.
    
    There are quite a few different measures of agreement. Neuendorf (2002, p. 162) refers to Popping (1988) who looked at 39 different measures and concluded that Cohen's kappa is the optimal one.
    
    Parameters
    ----------
    field1 : list or pandas series
        the first categorical field
    field2 : list or pandas series
        the first categorical field
    categories : list or dictionary, optional
        order and/or selection for categories of field1 and field2
        
    Returns
    -------
    A dataframe with:
    * *Kappa*, the kappa value.
    * *n*, the sample size
    * *statistic*, the test statistic (z-value)
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used (Cohen, 1960, p. 44):
    $$\\kappa = \\frac{n\\times P - Q}{n^2 - Q} = \\frac{p_0 - p_c}{1 - p_c}$$
    
    With:
    $$P = \\sum_{i=1}^r F_{i,i}$$
    $$Q = \\sum_{i=1}^r R_{i}\\times C_{i}$$
    $$p_0 = \\frac{P}{n}$$
    $$p_c = \\frac{Q}{n^2}$$
    
    The asymptotic standard errors are calculated using (Fleiss et al., 1969, p. 325):
    $$ASE_0 = \\sqrt{\\frac{SS_0}{n\\times\\left(1 - p_c\\right)^2}}$$
    $$ASE_1 = \\sqrt{\\frac{SS_1}{n\\times\\left(1 - p_c\\right)^4}}$$
    
    With:
    $$SS_0 = \\left(\\sum_{i=1}^r p_{i,.}\\times p_{.,i}\\times\\left(1 - \\left(p_{i,.} + p_{.,i}\\right)\\right)^2\\right) - p_c^2 + \\sum_{i=1}^r \\sum_{j=1, i\\neq j}^c p_{i,.}\\times p_{.,j}\\times\\left(p_{.,i} + p_{j,.}\\right)^2$$
    $$SS_1 = \\left(\\sum_{i=1}^r p_{i,i}\\times\\left(\\left(1 - p_c\\right) - \\left(p_{.,i} + p_{i,.}\\right)\\times\\left(1 - p_0\\right)\\right)^2\\right) - \\left(p_0\\times p_c - 2\\times p_c + p_0\\right)^2 + \\left(1 - p_0\\right)^2 \\times \\sum_{i=1}^r \\sum_{j=1, i\\neq j}^c p_{i,j}\\times\\left(p_{.,i}+p_{j,.}\\right)^2$$
    $$p_{i,j} = \\frac{F_{i,j}}{n}$$
    $$p_{i,.} = \\frac{R_{i}}{n}$$
    $$p_{.,j} = \\frac{C_{j}}{n}$$
    
    Approximate asymptotic standard errors could also be calculated using (Cohen, 1960, pp. 40, 43):
    $$ASE_0 \\approx \\sqrt{\\frac{p_c}{n\\times\\left(1 - p_c\\right)}}$$
    $$ASE_1 \\approx \\sqrt{\\frac{p_0\\times\\left(1-p_0\\right)}{n\\times\\left(1 - p_c\\right)^2}}$$
    
    The p-value (significance) is then calculated using:
    $$z_{\\kappa} = \\frac{\\kappa}{ASE_0}$$
    $$sig. = 2\\times\\left(1 - \\Phi\\left(z_{\\kappa}\\right)\\right)$$
    
    *Symbols used*
    
    * \\(F_{i,j}\\), the observed count in row i and column j.
    * \\(r\\), is the number of rows (categories in the first variable)
    * \\(c\\), is the number of columns (categories in the second variable)
    * \\(n\\), is the total number of scores
    * \\(R_i\\), the row total of row i. \\(R_i = \\sum_{j=1}^c F_{i,j}\\)
    * \\(C_j\\), the column total of column j. \\(C_j = \\sum_{i=1}^r F_{i,j}\\)
    
    References
    ----------
    Cohen, J. (1960). A coefficient of agreement for nominal scales. *Educational and Psychological Measurement, 20*(1), 37–46. doi:10.1177/001316446002000104
    
    Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. *Psychological Bulletin, 72*(5), 323–327. doi:10.1037/h0028106
    
    Neuendorf, K. A. (2002). *The content analysis guidebook*. SAGE Publications.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    
    #create the cross table
    ct = tab_cross(field1, field2, categories, categories, totals="include")    
    
    #basic counts
    k = ct.shape[0]-1
    n = ct.iloc[k, k]
    
    #STEP 1: Convert to percentages based on grand total
    p = pd.DataFrame()
    for i in range(0, k + 1):
        for j in range(0, k + 1):
            p.at[i, j] = ct.iloc[i, j] / n
    
    #STEP 2: P and Q
    Pcap = 0
    QC = 0
    for i in range(0, k):
        Pcap = Pcap + ct.iloc[i, i]
        QC = QC + ct.iloc[i, k] * ct.iloc[k, i]
    p0 = Pcap / n
    pc = QC / (n**2)
    
    #Cohen kappa
    kappa = (p0 - pc) / (1 - pc)
    
    
    #TEST
    ss0P1 = 0
    ss0P2 = 0
    for i in range(0, k):
        ss0P1 = ss0P1 + p.iloc[i, k] * p.iloc[k, i] * (1 - (p.iloc[k, i] + p.iloc[i, k]))**2
        
        for j in range(0, k):
            if i != j:
                ss0P2 = ss0P2 + p.iloc[i, k] * p.iloc[k, j] * (p.iloc[k, i] + p.iloc[j, k])**2
    
    ss0 = ss0P1 + ss0P2 - pc**2
    ase0 = (ss0 / (n * (1 - pc)**2))**0.5
    
    z = kappa / ase0
    pValue = 2 * (1 - NormalDist().cdf(abs(z))) 
    
    #the results
    colnames = ["Kappa", "n", "statistic", "p-value"]
    results = pd.DataFrame([[kappa, n, z, pValue]], columns=colnames)
    
    return (results)