Module stikpetP.other.poho_dunn_q
Expand source code
import pandas as pd
from statistics import NormalDist
def ph_dunn_q(data, success=None):
'''
Post-Hoc Dunn Test (for Cochran Q test)
---------------------------------------
An adaptation from IBM SPSS on the Dunn test, so it can be used as a post-hoc test for a Cochran Q test.
Parameters
----------
data : dataframe
dataframe with a column for each category
success : object, optional
the value that represents a 'success'. If None (default) the first value found will be used as success.
Returns
-------
res : dataframe
test results with the following columns
* *category 1*, label of first variable in comparison
* *category 2*, label of second variable in comparison
* *n suc. 1*, number of successes in first variable in comparison
* *n suc. 2*, number of successes in second variable in comparison
* *statistic*, test statistic
* *z-value*, standardized test statistic (z-value)
* *p-value*, p-value of the z-value
* *adj. p-value*, Bonferroni corrected p-value
Notes
-----
The formula used (IBM, 2021, p. 814):
$$z_{1,2} = \\frac{\\bar{d}_{1,2}}{SE}$$
$$sig. = 2\\times\\left(1-\\Phi\\left(\\left|z_{1,2}\\right|\\right)\\right)$$
With:
$$\\bar{d}_{1,2} = \\frac{ns_1 - ns_2}{n}$$
$$SE = \\sqrt{2\\times\\frac{k\\times\\sum_{i=1}^n R_i - \\sum_{i=1}^n R_i^2}{n^2\\times k\\times\\left(k-1\\right)}}$$
$$R_i = \\sum_{j=1}^k s_{i,j}$$
$$ns_j = \\sum_{i=1}^n s_{i,j}$$
$$s_{i,j} = \\begin{cases} 1 & \\text{ if } x_{i,j}= \\text{success} \\\\ 0 & \\text{ if } x_{i,j} \\neq \\text{success} \\end{cases}$$
IBM SPSS mentions this is an adaptation from Dunn (1964), originally for the Kruskal-Wallis test.
The Bonferroni adjustment is done using:
$$sig._{adj} = \\min \\left(sig. \\times n_c, 1\\right)$$
$$n_c = \\frac{k\\times\\left(k-1\\right)}{2}$$
*Symbols used*
* \\(x_{i,j}\\), the score in row i and column j
* \\(k\\), the number of variables
* \\(n\\), the total number of cases used
* \\(ns_j\\), the total number of successes in column j
* \\(R_i\\), the total number of successes in row i
* \\(\\Phi\\left(\\dots\\right)\\), the standard normal cumulative distribution function.
* \\(n_c\\), the number of comparisons (pairs)
References
----------
Dunn, O. J. (1964). Multiple comparisons using rank sums. *Technometrics, 6*(3), 241–252. doi:10.1080/00401706.1964.10490181
IBM. (2021). IBM SPSS Statistics Algorithms. IBM.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
df = data.dropna()
varNames = df.columns
k = len(varNames)
n = df.shape[0]
if success is None:
suc = df.iloc[0,0]
else:
suc = success
#replace success with 1 and failures with 0
pd.options.mode.chained_assignment = None
df[df!= suc] = 0
df[df == suc] = 1
#Row successes
df["rs"] = df.sum(axis=1)
rst = sum(df["rs"])
rs2t = sum(df["rs"]**2)
#standard error
se = (2 * (k * rst - rs2t) / (k * (k - 1) * n**2))**0.5
#number of comparisons
ncomp = k * (k - 1) / 2
#the pairwise comparisons
res = pd.DataFrame()
resRow=0
for i in range(k-1):
for j in range(i+1, k):
# create pairs
cat1 = varNames[i]
cat2 = varNames[j]
selDf = df[[cat1, cat2]]
selDf = selDf.dropna()
n1 = sum(selDf[cat1]==1)
n2 = sum(selDf[cat2]==1)
t = (n1 - n2)/n
z = t/se
pVal = 2 * (1 - NormalDist().cdf(abs(z)))
if pVal*ncomp > 1:
pAdj = 1
else:
pAdj = pVal*ncomp
res.at[resRow, 0] = cat1
res.at[resRow, 1] = cat2
res.at[resRow, 2] = n1
res.at[resRow, 3] = n2
res.at[resRow, 4] = t
res.at[resRow, 5] = z
res.at[resRow, 6] = pVal
res.at[resRow, 7] = pAdj
resRow=resRow+1
res.columns =["category 1", "category 2", "n suc. 1", "n suc. 2", "statistic", "z-value", "p-value", "adj. p-value"]
return res
Functions
def ph_dunn_q(data, success=None)
-
Post-Hoc Dunn Test (for Cochran Q test)
An adaptation from IBM SPSS on the Dunn test, so it can be used as a post-hoc test for a Cochran Q test.
Parameters
data
:dataframe
- dataframe with a column for each category
success
:object
, optional- the value that represents a 'success'. If None (default) the first value found will be used as success.
Returns
res
:dataframe
- test results with the following columns
- category 1, label of first variable in comparison
- category 2, label of second variable in comparison
- n suc. 1, number of successes in first variable in comparison
- n suc. 2, number of successes in second variable in comparison
- statistic, test statistic
- z-value, standardized test statistic (z-value)
- p-value, p-value of the z-value
- adj. p-value, Bonferroni corrected p-value
Notes
The formula used (IBM, 2021, p. 814): z_{1,2} = \frac{\bar{d}_{1,2}}{SE} sig. = 2\times\left(1-\Phi\left(\left|z_{1,2}\right|\right)\right)
With: \bar{d}_{1,2} = \frac{ns_1 - ns_2}{n} SE = \sqrt{2\times\frac{k\times\sum_{i=1}^n R_i - \sum_{i=1}^n R_i^2}{n^2\times k\times\left(k-1\right)}} R_i = \sum_{j=1}^k s_{i,j} ns_j = \sum_{i=1}^n s_{i,j}
s_{i,j} = \begin{cases} 1 & \text{ if } x_{i,j}= \text{success} \\ 0 & \text{ if } x_{i,j} \neq \text{success} \end{cases}IBM SPSS mentions this is an adaptation from Dunn (1964), originally for the Kruskal-Wallis test.
The Bonferroni adjustment is done using: sig._{adj} = \min \left(sig. \times n_c, 1\right) n_c = \frac{k\times\left(k-1\right)}{2}
Symbols used
- x_{i,j}, the score in row i and column j
- k, the number of variables
- n, the total number of cases used
- ns_j, the total number of successes in column j
- R_i, the total number of successes in row i
- \Phi\left(\dots\right), the standard normal cumulative distribution function.
- n_c, the number of comparisons (pairs)
References
Dunn, O. J. (1964). Multiple comparisons using rank sums. Technometrics, 6(3), 241–252. doi:10.1080/00401706.1964.10490181
IBM. (2021). IBM SPSS Statistics Algorithms. IBM.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ph_dunn_q(data, success=None): ''' Post-Hoc Dunn Test (for Cochran Q test) --------------------------------------- An adaptation from IBM SPSS on the Dunn test, so it can be used as a post-hoc test for a Cochran Q test. Parameters ---------- data : dataframe dataframe with a column for each category success : object, optional the value that represents a 'success'. If None (default) the first value found will be used as success. Returns ------- res : dataframe test results with the following columns * *category 1*, label of first variable in comparison * *category 2*, label of second variable in comparison * *n suc. 1*, number of successes in first variable in comparison * *n suc. 2*, number of successes in second variable in comparison * *statistic*, test statistic * *z-value*, standardized test statistic (z-value) * *p-value*, p-value of the z-value * *adj. p-value*, Bonferroni corrected p-value Notes ----- The formula used (IBM, 2021, p. 814): $$z_{1,2} = \\frac{\\bar{d}_{1,2}}{SE}$$ $$sig. = 2\\times\\left(1-\\Phi\\left(\\left|z_{1,2}\\right|\\right)\\right)$$ With: $$\\bar{d}_{1,2} = \\frac{ns_1 - ns_2}{n}$$ $$SE = \\sqrt{2\\times\\frac{k\\times\\sum_{i=1}^n R_i - \\sum_{i=1}^n R_i^2}{n^2\\times k\\times\\left(k-1\\right)}}$$ $$R_i = \\sum_{j=1}^k s_{i,j}$$ $$ns_j = \\sum_{i=1}^n s_{i,j}$$ $$s_{i,j} = \\begin{cases} 1 & \\text{ if } x_{i,j}= \\text{success} \\\\ 0 & \\text{ if } x_{i,j} \\neq \\text{success} \\end{cases}$$ IBM SPSS mentions this is an adaptation from Dunn (1964), originally for the Kruskal-Wallis test. The Bonferroni adjustment is done using: $$sig._{adj} = \\min \\left(sig. \\times n_c, 1\\right)$$ $$n_c = \\frac{k\\times\\left(k-1\\right)}{2}$$ *Symbols used* * \\(x_{i,j}\\), the score in row i and column j * \\(k\\), the number of variables * \\(n\\), the total number of cases used * \\(ns_j\\), the total number of successes in column j * \\(R_i\\), the total number of successes in row i * \\(\\Phi\\left(\\dots\\right)\\), the standard normal cumulative distribution function. * \\(n_c\\), the number of comparisons (pairs) References ---------- Dunn, O. J. (1964). Multiple comparisons using rank sums. *Technometrics, 6*(3), 241–252. doi:10.1080/00401706.1964.10490181 IBM. (2021). IBM SPSS Statistics Algorithms. IBM. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' df = data.dropna() varNames = df.columns k = len(varNames) n = df.shape[0] if success is None: suc = df.iloc[0,0] else: suc = success #replace success with 1 and failures with 0 pd.options.mode.chained_assignment = None df[df!= suc] = 0 df[df == suc] = 1 #Row successes df["rs"] = df.sum(axis=1) rst = sum(df["rs"]) rs2t = sum(df["rs"]**2) #standard error se = (2 * (k * rst - rs2t) / (k * (k - 1) * n**2))**0.5 #number of comparisons ncomp = k * (k - 1) / 2 #the pairwise comparisons res = pd.DataFrame() resRow=0 for i in range(k-1): for j in range(i+1, k): # create pairs cat1 = varNames[i] cat2 = varNames[j] selDf = df[[cat1, cat2]] selDf = selDf.dropna() n1 = sum(selDf[cat1]==1) n2 = sum(selDf[cat2]==1) t = (n1 - n2)/n z = t/se pVal = 2 * (1 - NormalDist().cdf(abs(z))) if pVal*ncomp > 1: pAdj = 1 else: pAdj = pVal*ncomp res.at[resRow, 0] = cat1 res.at[resRow, 1] = cat2 res.at[resRow, 2] = n1 res.at[resRow, 3] = n2 res.at[resRow, 4] = t res.at[resRow, 5] = z res.at[resRow, 6] = pVal res.at[resRow, 7] = pAdj resRow=resRow+1 res.columns =["category 1", "category 2", "n suc. 1", "n suc. 2", "statistic", "z-value", "p-value", "adj. p-value"] return res