Module stikpetP.tests.test_welch_owa
Expand source code
import pandas as pd
from scipy.stats import f
def ts_welch_owa(nomField, scaleField, categories=None):
'''
Welch One-Way ANOVA
-------------------
Tests if the means (averages) of each category could be the same in the population.
If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
Delacre et al. (2019) recommend to use the Welch ANOVA instead of the classic and Brown-Forsythe versions, but there are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.
Parameters
----------
nomField : pandas series
data with categories
scaleField : pandas series
data with the scores
categories : list or dictionary, optional
the categories to use from catField
Returns
-------
Dataframe with:
* *n*, the sample size
* *statistic*, the test statistic (F value)
* *df1*, degrees of freedom
* *df2*, degrees of freedom
* *p-value*, the p-value (significance)
Notes
-----
The formula used (Welch, 1951, p. 334):
$$F_{Welch}=\\frac{\\frac{1}{k-1}\\times\\sum_{j=1}^k w_j\\times\\left(\\bar{x}_j - \\bar{y}_w\\right)^2}{1+2\\times\\lambda\\times\\frac{k-2}{k^2-1}}$$
$$df_1 = k - 1$$
$$df_2 = \\frac{k^2 - 1}{3\\times \\lambda}$$
$$sig. = 1 - F\\left(F_{Welch}, df_1, df_2\\right)$$
With:
$$\\lambda = \\sum_{j=1}^k\\frac{\\left(1-h_j\\right)^2}{n_j -1}$$
$$\\bar{y}_w = \\sum_{j=1}^k h_j\\times\\bar{x}_j$$
$$h_j = \\frac{w_j}{w}$$
$$w = \\sum_{j=1}^k w_j$$
$$w_j = \\frac{n_j}{s_j^2}$$
$$s_j^2 = \\frac{\\sum_{i=1}^{n_j}\\left(x_{i,j} - \\bar{x}_j\\right)^2}{n_j - 1}$$
$$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
*Symbols used:*
* \\(x_{i,j}\\), the i-th score in category j
* \\(k\\), the number of categories
* \\(n_j\\), the sample size of category j
* \\(\\bar{x}_j\\), the sample mean of category j
* \\(s_j^2\\), the sample variance of the scores in category j
* \\(h_j\\), the adjusted weight for category j
* \\(w_j\\), the weight for category j
* \\(h_j\\), the adjusted weight for category j
* \\(df_i\\), the i-th degrees of freedom.
Cavus and Yazici (2020) make a difference between the Welch and the Welch-Aspin ANOVA. The only difference in the article is that with the Welch \\(2\\times\\left(k-2\\right)\\) is used, while in the Welch-Aspin version \\(2\\times k-2\\). I think this is a mistake in their formula, since the article they refer to from Aspin is about two means.
Johansen F test (Johansen, 1980) will give the same results
References
----------
Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008
Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. *International Review of Social Psychology, 32*(1), 1–12. doi:10.5334/irsp.198
Johansen, S. (1980). The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. *Biometrika, 67*(1), 85–92. doi:10.1093/biomet/67.1.85
Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. *Biometrika, 34*(1/2), 28–35. doi:10.2307/2332510
Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. *Biometrika, 38*(3/4), 330–336. doi:10.2307/2332579
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
if type(nomField) == list:
nomField = pd.Series(nomField)
if type(scaleField) == list:
scaleField = pd.Series(scaleField)
data = pd.concat([nomField, scaleField], axis=1)
data.columns = ["category", "score"]
#remove unused categories
if categories is not None:
data = data[data.category.isin(categories)]
#Remove rows with missing values and reset index
data = data.dropna()
data.reset_index()
#overall n, mean and ss
n = len(data["category"])
m = data.score.mean()
sst = data.score.var()*(n-1)
#sample sizes, variances and means per category
nj = data.groupby('category').count()
sj2 = data.groupby('category').var()
mj = data.groupby('category').mean()
#number of categories
k = len(mj)
wj = nj / sj2
w = float(wj.sum())
hj = wj/w
yw = float((hj*mj).sum())
lm = float(((1 - hj)**2/(nj - 1)).sum())
chi2Val = float((wj*(mj - yw)**2).sum())
fVal = chi2Val / ((k - 1)+2*lm*(k - 2)/(k+1))
df1 = k - 1
df2 = (k**2 - 1)/(3*lm)
pVal = f.sf(fVal, df1, df2)
#results
res = pd.DataFrame([[n, fVal, df1, df2, pVal]])
res.columns = ["n", "statistic", "df1", "df2", "p-value"]
return res
Functions
def ts_welch_owa(nomField, scaleField, categories=None)-
Welch One-Way ANOVA
Tests if the means (averages) of each category could be the same in the population.
If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
Delacre et al. (2019) recommend to use the Welch ANOVA instead of the classic and Brown-Forsythe versions, but there are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.
Parameters
nomField:pandas series- data with categories
scaleField:pandas series- data with the scores
categories:listordictionary, optional- the categories to use from catField
Returns
Dataframe with:
- n, the sample size
- statistic, the test statistic (F value)
- df1, degrees of freedom
- df2, degrees of freedom
- p-value, the p-value (significance)
Notes
The formula used (Welch, 1951, p. 334): F_{Welch}=\frac{\frac{1}{k-1}\times\sum_{j=1}^k w_j\times\left(\bar{x}_j - \bar{y}_w\right)^2}{1+2\times\lambda\times\frac{k-2}{k^2-1}} df_1 = k - 1 df_2 = \frac{k^2 - 1}{3\times \lambda} sig. = 1 - F\left(F_{Welch}, df_1, df_2\right)
With: \lambda = \sum_{j=1}^k\frac{\left(1-h_j\right)^2}{n_j -1} \bar{y}_w = \sum_{j=1}^k h_j\times\bar{x}_j h_j = \frac{w_j}{w} w = \sum_{j=1}^k w_j w_j = \frac{n_j}{s_j^2} s_j^2 = \frac{\sum_{i=1}^{n_j}\left(x_{i,j} - \bar{x}_j\right)^2}{n_j - 1} \bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j}
Symbols used:
- x_{i,j}, the i-th score in category j
- k, the number of categories
- n_j, the sample size of category j
- \bar{x}_j, the sample mean of category j
- s_j^2, the sample variance of the scores in category j
- h_j, the adjusted weight for category j
- w_j, the weight for category j
- h_j, the adjusted weight for category j
- df_i, the i-th degrees of freedom.
Cavus and Yazici (2020) make a difference between the Welch and the Welch-Aspin ANOVA. The only difference in the article is that with the Welch 2\times\left(k-2\right) is used, while in the Welch-Aspin version 2\times k-2. I think this is a mistake in their formula, since the article they refer to from Aspin is about two means.
Johansen F test (Johansen, 1980) will give the same results
References
Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. The R Journal, 12(2), 134. doi:10.32614/RJ-2021-008
Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. International Review of Social Psychology, 32(1), 1–12. doi:10.5334/irsp.198
Johansen, S. (1980). The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. Biometrika, 67(1), 85–92. doi:10.1093/biomet/67.1.85
Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. Biometrika, 34(1/2), 28–35. doi:10.2307/2332510
Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38(3/4), 330–336. doi:10.2307/2332579
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_welch_owa(nomField, scaleField, categories=None): ''' Welch One-Way ANOVA ------------------- Tests if the means (averages) of each category could be the same in the population. If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population. Delacre et al. (2019) recommend to use the Welch ANOVA instead of the classic and Brown-Forsythe versions, but there are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences. Parameters ---------- nomField : pandas series data with categories scaleField : pandas series data with the scores categories : list or dictionary, optional the categories to use from catField Returns ------- Dataframe with: * *n*, the sample size * *statistic*, the test statistic (F value) * *df1*, degrees of freedom * *df2*, degrees of freedom * *p-value*, the p-value (significance) Notes ----- The formula used (Welch, 1951, p. 334): $$F_{Welch}=\\frac{\\frac{1}{k-1}\\times\\sum_{j=1}^k w_j\\times\\left(\\bar{x}_j - \\bar{y}_w\\right)^2}{1+2\\times\\lambda\\times\\frac{k-2}{k^2-1}}$$ $$df_1 = k - 1$$ $$df_2 = \\frac{k^2 - 1}{3\\times \\lambda}$$ $$sig. = 1 - F\\left(F_{Welch}, df_1, df_2\\right)$$ With: $$\\lambda = \\sum_{j=1}^k\\frac{\\left(1-h_j\\right)^2}{n_j -1}$$ $$\\bar{y}_w = \\sum_{j=1}^k h_j\\times\\bar{x}_j$$ $$h_j = \\frac{w_j}{w}$$ $$w = \\sum_{j=1}^k w_j$$ $$w_j = \\frac{n_j}{s_j^2}$$ $$s_j^2 = \\frac{\\sum_{i=1}^{n_j}\\left(x_{i,j} - \\bar{x}_j\\right)^2}{n_j - 1}$$ $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$ *Symbols used:* * \\(x_{i,j}\\), the i-th score in category j * \\(k\\), the number of categories * \\(n_j\\), the sample size of category j * \\(\\bar{x}_j\\), the sample mean of category j * \\(s_j^2\\), the sample variance of the scores in category j * \\(h_j\\), the adjusted weight for category j * \\(w_j\\), the weight for category j * \\(h_j\\), the adjusted weight for category j * \\(df_i\\), the i-th degrees of freedom. Cavus and Yazici (2020) make a difference between the Welch and the Welch-Aspin ANOVA. The only difference in the article is that with the Welch \\(2\\times\\left(k-2\\right)\\) is used, while in the Welch-Aspin version \\(2\\times k-2\\). I think this is a mistake in their formula, since the article they refer to from Aspin is about two means. Johansen F test (Johansen, 1980) will give the same results References ---------- Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008 Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. *International Review of Social Psychology, 32*(1), 1–12. doi:10.5334/irsp.198 Johansen, S. (1980). The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. *Biometrika, 67*(1), 85–92. doi:10.1093/biomet/67.1.85 Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. *Biometrika, 34*(1/2), 28–35. doi:10.2307/2332510 Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. *Biometrika, 38*(3/4), 330–336. doi:10.2307/2332579 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' if type(nomField) == list: nomField = pd.Series(nomField) if type(scaleField) == list: scaleField = pd.Series(scaleField) data = pd.concat([nomField, scaleField], axis=1) data.columns = ["category", "score"] #remove unused categories if categories is not None: data = data[data.category.isin(categories)] #Remove rows with missing values and reset index data = data.dropna() data.reset_index() #overall n, mean and ss n = len(data["category"]) m = data.score.mean() sst = data.score.var()*(n-1) #sample sizes, variances and means per category nj = data.groupby('category').count() sj2 = data.groupby('category').var() mj = data.groupby('category').mean() #number of categories k = len(mj) wj = nj / sj2 w = float(wj.sum()) hj = wj/w yw = float((hj*mj).sum()) lm = float(((1 - hj)**2/(nj - 1)).sum()) chi2Val = float((wj*(mj - yw)**2).sum()) fVal = chi2Val / ((k - 1)+2*lm*(k - 2)/(k+1)) df1 = k - 1 df2 = (k**2 - 1)/(3*lm) pVal = f.sf(fVal, df1, df2) #results res = pd.DataFrame([[n, fVal, df1, df2, pVal]]) res.columns = ["n", "statistic", "df1", "df2", "p-value"] return res