Module stikpetP.effect_sizes.eff_size_cohen_kappa
Expand source code
import pandas as pd
from statistics import NormalDist
from ..other.table_cross import tab_cross
def es_cohen_kappa(field1, field2, categories=None):
'''
Cohen Kappa
-----------
An effect size meaure, that measures the how strongly two raters or variables, agree with each other. No agreement would result in a kappa of 0, and full agreement a kappa of 1.
There are quite a few different measures of agreement. Neuendorf (2002, p. 162) refers to Popping (1988) who looked at 39 different measures and concluded that Cohen's kappa is the optimal one.
Parameters
----------
field1 : list or pandas series
the first categorical field
field2 : list or pandas series
the first categorical field
categories : list or dictionary, optional
order and/or selection for categories of field1 and field2
Returns
-------
A dataframe with:
* *Kappa*, the kappa value.
* *n*, the sample size
* *statistic*, the test statistic (z-value)
* *p-value*, the p-value (significance)
Notes
-----
The formula used (Cohen, 1960, p. 44):
$$\\kappa = \\frac{n\\times P - Q}{n^2 - Q} = \\frac{p_0 - p_c}{1 - p_c}$$
With:
$$P = \\sum_{i=1}^r F_{i,i}$$
$$Q = \\sum_{i=1}^r R_{i}\\times C_{i}$$
$$p_0 = \\frac{P}{n}$$
$$p_c = \\frac{Q}{n^2}$$
The asymptotic standard errors are calculated using (Fleiss et al., 1969, p. 325):
$$ASE_0 = \\sqrt{\\frac{SS_0}{n\\times\\left(1 - p_c\\right)^2}}$$
$$ASE_1 = \\sqrt{\\frac{SS_1}{n\\times\\left(1 - p_c\\right)^4}}$$
With:
$$SS_0 = \\left(\\sum_{i=1}^r p_{i,.}\\times p_{.,i}\\times\\left(1 - \\left(p_{i,.} + p_{.,i}\\right)\\right)^2\\right) - p_c^2 + \\sum_{i=1}^r \\sum_{j=1, i\\neq j}^c p_{i,.}\\times p_{.,j}\\times\\left(p_{.,i} + p_{j,.}\\right)^2$$
$$SS_1 = \\left(\\sum_{i=1}^r p_{i,i}\\times\\left(\\left(1 - p_c\\right) - \\left(p_{.,i} + p_{i,.}\\right)\\times\\left(1 - p_0\\right)\\right)^2\\right) - \\left(p_0\\times p_c - 2\\times p_c + p_0\\right)^2 + \\left(1 - p_0\\right)^2 \\times \\sum_{i=1}^r \\sum_{j=1, i\\neq j}^c p_{i,j}\\times\\left(p_{.,i}+p_{j,.}\\right)^2$$
$$p_{i,j} = \\frac{F_{i,j}}{n}$$
$$p_{i,.} = \\frac{R_{i}}{n}$$
$$p_{.,j} = \\frac{C_{j}}{n}$$
Approximate asymptotic standard errors could also be calculated using (Cohen, 1960, pp. 40, 43):
$$ASE_0 \\approx \\sqrt{\\frac{p_c}{n\\times\\left(1 - p_c\\right)}}$$
$$ASE_1 \\approx \\sqrt{\\frac{p_0\\times\\left(1-p_0\\right)}{n\\times\\left(1 - p_c\\right)^2}}$$
The p-value (significance) is then calculated using:
$$z_{\\kappa} = \\frac{\\kappa}{ASE_0}$$
$$sig. = 2\\times\\left(1 - \\Phi\\left(z_{\\kappa}\\right)\\right)$$
*Symbols used*
* \\(F_{i,j}\\), the observed count in row i and column j.
* \\(r\\), is the number of rows (categories in the first variable)
* \\(c\\), is the number of columns (categories in the second variable)
* \\(n\\), is the total number of scores
* \\(R_i\\), the row total of row i. \\(R_i = \\sum_{j=1}^c F_{i,j}\\)
* \\(C_j\\), the column total of column j. \\(C_j = \\sum_{i=1}^r F_{i,j}\\)
References
----------
Cohen, J. (1960). A coefficient of agreement for nominal scales. *Educational and Psychological Measurement, 20*(1), 37–46. doi:10.1177/001316446002000104
Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. *Psychological Bulletin, 72*(5), 323–327. doi:10.1037/h0028106
Neuendorf, K. A. (2002). *The content analysis guidebook*. SAGE Publications.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
#create the cross table
ct = tab_cross(field1, field2, categories, categories, totals="include")
#basic counts
k = ct.shape[0]-1
n = ct.iloc[k, k]
#STEP 1: Convert to percentages based on grand total
p = pd.DataFrame()
for i in range(0, k + 1):
for j in range(0, k + 1):
p.at[i, j] = ct.iloc[i, j] / n
#STEP 2: P and Q
Pcap = 0
QC = 0
for i in range(0, k):
Pcap = Pcap + ct.iloc[i, i]
QC = QC + ct.iloc[i, k] * ct.iloc[k, i]
p0 = Pcap / n
pc = QC / (n**2)
#Cohen kappa
kappa = (p0 - pc) / (1 - pc)
#TEST
ss0P1 = 0
ss0P2 = 0
for i in range(0, k):
ss0P1 = ss0P1 + p.iloc[i, k] * p.iloc[k, i] * (1 - (p.iloc[k, i] + p.iloc[i, k]))**2
for j in range(0, k):
if i != j:
ss0P2 = ss0P2 + p.iloc[i, k] * p.iloc[k, j] * (p.iloc[k, i] + p.iloc[j, k])**2
ss0 = ss0P1 + ss0P2 - pc**2
ase0 = (ss0 / (n * (1 - pc)**2))**0.5
z = kappa / ase0
pValue = 2 * (1 - NormalDist().cdf(abs(z)))
#the results
colnames = ["Kappa", "n", "statistic", "p-value"]
results = pd.DataFrame([[kappa, n, z, pValue]], columns=colnames)
return (results)
Functions
def es_cohen_kappa(field1, field2, categories=None)
-
Cohen Kappa
An effect size meaure, that measures the how strongly two raters or variables, agree with each other. No agreement would result in a kappa of 0, and full agreement a kappa of 1.
There are quite a few different measures of agreement. Neuendorf (2002, p. 162) refers to Popping (1988) who looked at 39 different measures and concluded that Cohen's kappa is the optimal one.
Parameters
field1
:list
orpandas series
- the first categorical field
field2
:list
orpandas series
- the first categorical field
categories
:list
ordictionary
, optional- order and/or selection for categories of field1 and field2
Returns
A dataframe with:
- Kappa, the kappa value.
- n, the sample size
- statistic, the test statistic (z-value)
- p-value, the p-value (significance)
Notes
The formula used (Cohen, 1960, p. 44): \kappa = \frac{n\times P - Q}{n^2 - Q} = \frac{p_0 - p_c}{1 - p_c}
With: P = \sum_{i=1}^r F_{i,i} Q = \sum_{i=1}^r R_{i}\times C_{i} p_0 = \frac{P}{n} p_c = \frac{Q}{n^2}
The asymptotic standard errors are calculated using (Fleiss et al., 1969, p. 325): ASE_0 = \sqrt{\frac{SS_0}{n\times\left(1 - p_c\right)^2}} ASE_1 = \sqrt{\frac{SS_1}{n\times\left(1 - p_c\right)^4}}
With: SS_0 = \left(\sum_{i=1}^r p_{i,.}\times p_{.,i}\times\left(1 - \left(p_{i,.} + p_{.,i}\right)\right)^2\right) - p_c^2 + \sum_{i=1}^r \sum_{j=1, i\neq j}^c p_{i,.}\times p_{.,j}\times\left(p_{.,i} + p_{j,.}\right)^2 SS_1 = \left(\sum_{i=1}^r p_{i,i}\times\left(\left(1 - p_c\right) - \left(p_{.,i} + p_{i,.}\right)\times\left(1 - p_0\right)\right)^2\right) - \left(p_0\times p_c - 2\times p_c + p_0\right)^2 + \left(1 - p_0\right)^2 \times \sum_{i=1}^r \sum_{j=1, i\neq j}^c p_{i,j}\times\left(p_{.,i}+p_{j,.}\right)^2 p_{i,j} = \frac{F_{i,j}}{n} p_{i,.} = \frac{R_{i}}{n} p_{.,j} = \frac{C_{j}}{n}
Approximate asymptotic standard errors could also be calculated using (Cohen, 1960, pp. 40, 43): ASE_0 \approx \sqrt{\frac{p_c}{n\times\left(1 - p_c\right)}} ASE_1 \approx \sqrt{\frac{p_0\times\left(1-p_0\right)}{n\times\left(1 - p_c\right)^2}}
The p-value (significance) is then calculated using: z_{\kappa} = \frac{\kappa}{ASE_0} sig. = 2\times\left(1 - \Phi\left(z_{\kappa}\right)\right)
Symbols used
- F_{i,j}, the observed count in row i and column j.
- r, is the number of rows (categories in the first variable)
- c, is the number of columns (categories in the second variable)
- n, is the total number of scores
- R_i, the row total of row i. R_i = \sum_{j=1}^c F_{i,j}
- C_j, the column total of column j. C_j = \sum_{i=1}^r F_{i,j}
References
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. doi:10.1177/001316446002000104
Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5), 323–327. doi:10.1037/h0028106
Neuendorf, K. A. (2002). The content analysis guidebook. SAGE Publications.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def es_cohen_kappa(field1, field2, categories=None): ''' Cohen Kappa ----------- An effect size meaure, that measures the how strongly two raters or variables, agree with each other. No agreement would result in a kappa of 0, and full agreement a kappa of 1. There are quite a few different measures of agreement. Neuendorf (2002, p. 162) refers to Popping (1988) who looked at 39 different measures and concluded that Cohen's kappa is the optimal one. Parameters ---------- field1 : list or pandas series the first categorical field field2 : list or pandas series the first categorical field categories : list or dictionary, optional order and/or selection for categories of field1 and field2 Returns ------- A dataframe with: * *Kappa*, the kappa value. * *n*, the sample size * *statistic*, the test statistic (z-value) * *p-value*, the p-value (significance) Notes ----- The formula used (Cohen, 1960, p. 44): $$\\kappa = \\frac{n\\times P - Q}{n^2 - Q} = \\frac{p_0 - p_c}{1 - p_c}$$ With: $$P = \\sum_{i=1}^r F_{i,i}$$ $$Q = \\sum_{i=1}^r R_{i}\\times C_{i}$$ $$p_0 = \\frac{P}{n}$$ $$p_c = \\frac{Q}{n^2}$$ The asymptotic standard errors are calculated using (Fleiss et al., 1969, p. 325): $$ASE_0 = \\sqrt{\\frac{SS_0}{n\\times\\left(1 - p_c\\right)^2}}$$ $$ASE_1 = \\sqrt{\\frac{SS_1}{n\\times\\left(1 - p_c\\right)^4}}$$ With: $$SS_0 = \\left(\\sum_{i=1}^r p_{i,.}\\times p_{.,i}\\times\\left(1 - \\left(p_{i,.} + p_{.,i}\\right)\\right)^2\\right) - p_c^2 + \\sum_{i=1}^r \\sum_{j=1, i\\neq j}^c p_{i,.}\\times p_{.,j}\\times\\left(p_{.,i} + p_{j,.}\\right)^2$$ $$SS_1 = \\left(\\sum_{i=1}^r p_{i,i}\\times\\left(\\left(1 - p_c\\right) - \\left(p_{.,i} + p_{i,.}\\right)\\times\\left(1 - p_0\\right)\\right)^2\\right) - \\left(p_0\\times p_c - 2\\times p_c + p_0\\right)^2 + \\left(1 - p_0\\right)^2 \\times \\sum_{i=1}^r \\sum_{j=1, i\\neq j}^c p_{i,j}\\times\\left(p_{.,i}+p_{j,.}\\right)^2$$ $$p_{i,j} = \\frac{F_{i,j}}{n}$$ $$p_{i,.} = \\frac{R_{i}}{n}$$ $$p_{.,j} = \\frac{C_{j}}{n}$$ Approximate asymptotic standard errors could also be calculated using (Cohen, 1960, pp. 40, 43): $$ASE_0 \\approx \\sqrt{\\frac{p_c}{n\\times\\left(1 - p_c\\right)}}$$ $$ASE_1 \\approx \\sqrt{\\frac{p_0\\times\\left(1-p_0\\right)}{n\\times\\left(1 - p_c\\right)^2}}$$ The p-value (significance) is then calculated using: $$z_{\\kappa} = \\frac{\\kappa}{ASE_0}$$ $$sig. = 2\\times\\left(1 - \\Phi\\left(z_{\\kappa}\\right)\\right)$$ *Symbols used* * \\(F_{i,j}\\), the observed count in row i and column j. * \\(r\\), is the number of rows (categories in the first variable) * \\(c\\), is the number of columns (categories in the second variable) * \\(n\\), is the total number of scores * \\(R_i\\), the row total of row i. \\(R_i = \\sum_{j=1}^c F_{i,j}\\) * \\(C_j\\), the column total of column j. \\(C_j = \\sum_{i=1}^r F_{i,j}\\) References ---------- Cohen, J. (1960). A coefficient of agreement for nominal scales. *Educational and Psychological Measurement, 20*(1), 37–46. doi:10.1177/001316446002000104 Fleiss, J. L., Cohen, J., & Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. *Psychological Bulletin, 72*(5), 323–327. doi:10.1037/h0028106 Neuendorf, K. A. (2002). *The content analysis guidebook*. SAGE Publications. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' #create the cross table ct = tab_cross(field1, field2, categories, categories, totals="include") #basic counts k = ct.shape[0]-1 n = ct.iloc[k, k] #STEP 1: Convert to percentages based on grand total p = pd.DataFrame() for i in range(0, k + 1): for j in range(0, k + 1): p.at[i, j] = ct.iloc[i, j] / n #STEP 2: P and Q Pcap = 0 QC = 0 for i in range(0, k): Pcap = Pcap + ct.iloc[i, i] QC = QC + ct.iloc[i, k] * ct.iloc[k, i] p0 = Pcap / n pc = QC / (n**2) #Cohen kappa kappa = (p0 - pc) / (1 - pc) #TEST ss0P1 = 0 ss0P2 = 0 for i in range(0, k): ss0P1 = ss0P1 + p.iloc[i, k] * p.iloc[k, i] * (1 - (p.iloc[k, i] + p.iloc[i, k]))**2 for j in range(0, k): if i != j: ss0P2 = ss0P2 + p.iloc[i, k] * p.iloc[k, j] * (p.iloc[k, i] + p.iloc[j, k])**2 ss0 = ss0P1 + ss0P2 - pc**2 ase0 = (ss0 / (n * (1 - pc)**2))**0.5 z = kappa / ase0 pValue = 2 * (1 - NormalDist().cdf(abs(z))) #the results colnames = ["Kappa", "n", "statistic", "p-value"] results = pd.DataFrame([[kappa, n, z, pValue]], columns=colnames) return (results)