Module stikpetP.tests.test_fisher_owa
Expand source code
import pandas as pd
from scipy.stats import f
def ts_fisher_owa(nomField, scaleField, categories=None):
'''
Fisher/Classic One-Way ANOVA / F-Test
----------------------
Tests if the means (averages) of each category could be the same in the population.
If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
There are quite some alternatives for this, the stikpet library has Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes for some discussion on the differences.
Parameters
----------
nomField : pandas series
data with categories
scaleField : pandas series
data with the scores
categories : list, optional
the categories to use from catField
Returns
-------
Dataframe with an ANOVA table showing:
* *variance*, which variance is shown in that row
* *SS*, sum of squared deviations from the mean
* *df*, degrees of freedom
* *MS*, the mean square
* *F*, the F-statistic value
* *p-value*, the p-value (significance)
Notes
-----
The formula used is:
$$F_{Fisher} = \\frac{MS_b}{MS_w}$$
$$df_b = k - 1$$
$$df_w = n - k$$
$$sig. = 1 - F\\left(F_{Fisher}, df_b, df_w\\right)$$
With:
$$MS_b = \\frac{SS_b}{df_b}$$
$$MS_w = \\frac{SS_w}{df_w}$$
$$SS_b = \\sum_{j=1}^k n_j\\times\\left(\\bar{x}_j - \\bar{x}\\right)^2$$
$$SS_w = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}_j\\right)^2$$
$$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
$$\\bar{x} = \\frac{\\sum_{j=1}^k n_j \\times \\bar{x}_j}{n} = \\frac{\\sum_{j=1}^k \\sum_{i=1}^{n_j} x_{i,j}}{n}$$
$$n = \\sum_{j=1}^k n_j$$
*Symbols used:*
* \\(x_{i,j}\\), the i-th score in category j
* \\(n\\), the total sample size
* \\(n_j\\), the number of scores in category j
* \\(k\\), the number of categories
* \\(\\bar{x}_j\\), the mean of the scores in category j
* \\(MS_i\\), the mean square of i
* \\(SS_i\\), the sum of squares of i (sum of squared deviation of the mean)
* \\(df_i\\), the degrees of freedom of i
* \\(b\\), is between = factor = treatment = model
* \\(w\\), is within = error (the variability within the groups)
The formula most likely originated from Fisher (1921).
**Choosing a test**
The classic/Fisher one-way ANOVA assumes the data is normally distributed and that the variances in each group are the same in the population (homoscedasticity). Many have tried to cover the situations when one or both of these conditions are not met.
Delacre et al. (2019) recommend to use the Welch ANOVA instead of the classic and Brown-Forsythe versions. How2stats (2018) give a slightly different recommendation based on Tomarken and Serlin (1986). They agree that usually the Welch ANOVA is preferred of the classic version, but if the average sample size is below six to still use the Brown-Forsythe.
The researchers in the previous paragraph did not take into consideration other approaches. A few comments found on those other methods.
According to Hartung et al. (2002, p. 225) the Cochran test is the standard test in meta-analysis, but should not be used, since it is always too liberal.
Schneider and Penfield (1997) looked at the Welch, Alexander-Govern and the James test (they ignored the Brown-Forsythe since they found it to perform worse than Welch or James), and concluded: “Under variance heterogeneity, Alexander-Govern’s approximation was not only comparable to the Welch test and the James second-order test but was superior, in certain instances, when coupled with the power results for those tests” (p. 285).
Cavus and Yazici (2020) compared many different tests. They showed that the Brown-Forsythe, Box correction, Cochran, Hartung-Agac-Makabi adjusted Welch, and Scott-Smith test, all do not perform well, compared to the Asiribo-Gurland correction, Alexander-Govern test, Özdemir-Kurt B2, Mehrotra modified Brown-Forsythe, and Welch.
I only came across the Johansen test in Algina et. al. (1991) and it appears to give the same results as the Welch test.
In my experience the one-way ANOVA is widely known and often discussed in textbooks. The Welch anova is gaining popularity. The Brown-Forsythe is already more obscure and some confuse it with the Brown-Forsythe test for variances. The James test and the Alexander-Govern are perhaps the least known and the Johansen even less than that (at least they were for me). So, although the Alexander-Govern test might be preferred over the Welch test, some researchers prefer to use a more commonly used test than a more obscure version. In the end it is up to you to decide on what might be the best test, and also depending on the importance of your research you might want to investigate which test fits your situation best, rather than taking my word for it.
Besides these, there are more methods, some using simulation (bootstrapping) (see Cavus and Yazici (2020) for a few of them), others using different techniques (see Yiğit and Gökpinar (2010) for a few more methods not in here).
References
----------
Algina, J., Oshima, T. C., & Tang, K. L. (1991). Robustness of Yao’s, James’, and Johansen’s Tests under variance-covariance heteroscedasticity and nonnormality. *Journal of Educational Statistics, 16*(2), 125–139. doi:10.2307/1165116
Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008
Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. *International Review of Social Psychology, 32*(1), 1–12. doi:10.5334/irsp.198
Fisher, R. A. (1921). On the “probable error” of a coefficient of correlation deduced from a small sample. *Metron, 1*, 3–32.
Hartung, J., Argaç, D., & Makambi, K. H. (2002). Small sample properties of tests on homogeneity in one-way anova and meta-analysis. *Statistical Papers, 43*(2), 197–235. doi:10.1007/s00362-002-0097-8
how2stats (Director). (2018, June 11). Welch’s F-test vs Brown-Forsythe F-test: Which Should You Use and When? https://youtu.be/jteKmatBgF8
Schneider, P. J., & Penfield, D. A. (1997). Alexander and Govern’s approximation: Providing an alternative to ANOVA under variance heterogeneity. *The Journal of Experimental Education, 65*(3), 271–286. doi:10.1080/00220973.1997.9943459
Tomarken, A. J., & Serlin, R. C. (1986). Comparison of ANOVA alternatives under variance heterogeneity and specific noncentrality structures. *Psychological Bulletin, 99*(1), 90–99. doi:10.1037/0033-2909.99.1.90
Yiğit, E., & Gökpinar, F. (2010). A simulation study on tests for one-way ANOVA under the unequal variance assumption. Communications, Faculty Of Science, University of Ankara, 15–34. doi:10.1501/Commua1_0000000660
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
if type(nomField) == list:
nomField = pd.Series(nomField)
if type(scaleField) == list:
scaleField = pd.Series(scaleField)
data = pd.concat([nomField, scaleField], axis=1)
data.columns = ["category", "score"]
#remove unused categories
if categories is not None:
data = data[data.category.isin(categories)]
#Remove rows with missing values and reset index
data = data.dropna()
data.reset_index()
#overall n, mean and ss
n = len(data["category"])
m = data.score.mean()
sst = data.score.var()*(n-1)
#sample sizes, and means per category
nj = data.groupby('category').count()
sj = data.groupby('category').sum()
mj = data.groupby('category').mean()
#number of categories
k = len(mj)
ssb = float((nj*(mj-m)**2).sum())
ssw = sst - ssb
dfb = k - 1
dfw = n - k
dft = n - 1
msb = ssb/dfb
msw = ssw/dfw
fVal = msb/msw
p = f.sf(fVal, dfb, dfw)
res = pd.DataFrame()
res["variance"] = ["between", "within", "total"]
res["SS"] = [ssb, ssw, sst]
res["df"] = [dfb, dfw, dft]
res["MS"] = [msb, msw, None]
res["F"] = [fVal, None, None]
res["p-value"] = [p, None, None]
return res
Functions
def ts_fisher_owa(nomField, scaleField, categories=None)-
Fisher/Classic One-Way ANOVA / F-Test
Tests if the means (averages) of each category could be the same in the population.
If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
There are quite some alternatives for this, the stikpet library has Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes for some discussion on the differences.
Parameters
nomField:pandas series- data with categories
scaleField:pandas series- data with the scores
categories:list, optional- the categories to use from catField
Returns
Dataframe with an ANOVA table showing:
- variance, which variance is shown in that row
- SS, sum of squared deviations from the mean
- df, degrees of freedom
- MS, the mean square
- F, the F-statistic value
- p-value, the p-value (significance)
Notes
The formula used is: F_{Fisher} = \frac{MS_b}{MS_w} df_b = k - 1 df_w = n - k sig. = 1 - F\left(F_{Fisher}, df_b, df_w\right)
With: MS_b = \frac{SS_b}{df_b} MS_w = \frac{SS_w}{df_w} SS_b = \sum_{j=1}^k n_j\times\left(\bar{x}_j - \bar{x}\right)^2 SS_w = \sum_{j=1}^k \sum_{i=1}^{n_j} \left(x_{i,j} - \bar{x}_j\right)^2 \bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j} \bar{x} = \frac{\sum_{j=1}^k n_j \times \bar{x}_j}{n} = \frac{\sum_{j=1}^k \sum_{i=1}^{n_j} x_{i,j}}{n} n = \sum_{j=1}^k n_j
Symbols used:
- x_{i,j}, the i-th score in category j
- n, the total sample size
- n_j, the number of scores in category j
- k, the number of categories
- \bar{x}_j, the mean of the scores in category j
- MS_i, the mean square of i
- SS_i, the sum of squares of i (sum of squared deviation of the mean)
- df_i, the degrees of freedom of i
- b, is between = factor = treatment = model
- w, is within = error (the variability within the groups)
The formula most likely originated from Fisher (1921).
Choosing a test
The classic/Fisher one-way ANOVA assumes the data is normally distributed and that the variances in each group are the same in the population (homoscedasticity). Many have tried to cover the situations when one or both of these conditions are not met.
Delacre et al. (2019) recommend to use the Welch ANOVA instead of the classic and Brown-Forsythe versions. How2stats (2018) give a slightly different recommendation based on Tomarken and Serlin (1986). They agree that usually the Welch ANOVA is preferred of the classic version, but if the average sample size is below six to still use the Brown-Forsythe.
The researchers in the previous paragraph did not take into consideration other approaches. A few comments found on those other methods.
According to Hartung et al. (2002, p. 225) the Cochran test is the standard test in meta-analysis, but should not be used, since it is always too liberal.
Schneider and Penfield (1997) looked at the Welch, Alexander-Govern and the James test (they ignored the Brown-Forsythe since they found it to perform worse than Welch or James), and concluded: “Under variance heterogeneity, Alexander-Govern’s approximation was not only comparable to the Welch test and the James second-order test but was superior, in certain instances, when coupled with the power results for those tests” (p. 285).
Cavus and Yazici (2020) compared many different tests. They showed that the Brown-Forsythe, Box correction, Cochran, Hartung-Agac-Makabi adjusted Welch, and Scott-Smith test, all do not perform well, compared to the Asiribo-Gurland correction, Alexander-Govern test, Özdemir-Kurt B2, Mehrotra modified Brown-Forsythe, and Welch.
I only came across the Johansen test in Algina et. al. (1991) and it appears to give the same results as the Welch test.
In my experience the one-way ANOVA is widely known and often discussed in textbooks. The Welch anova is gaining popularity. The Brown-Forsythe is already more obscure and some confuse it with the Brown-Forsythe test for variances. The James test and the Alexander-Govern are perhaps the least known and the Johansen even less than that (at least they were for me). So, although the Alexander-Govern test might be preferred over the Welch test, some researchers prefer to use a more commonly used test than a more obscure version. In the end it is up to you to decide on what might be the best test, and also depending on the importance of your research you might want to investigate which test fits your situation best, rather than taking my word for it.
Besides these, there are more methods, some using simulation (bootstrapping) (see Cavus and Yazici (2020) for a few of them), others using different techniques (see Yiğit and Gökpinar (2010) for a few more methods not in here).
References
Algina, J., Oshima, T. C., & Tang, K. L. (1991). Robustness of Yao’s, James’, and Johansen’s Tests under variance-covariance heteroscedasticity and nonnormality. Journal of Educational Statistics, 16(2), 125–139. doi:10.2307/1165116
Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. The R Journal, 12(2), 134. doi:10.32614/RJ-2021-008
Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. International Review of Social Psychology, 32(1), 1–12. doi:10.5334/irsp.198
Fisher, R. A. (1921). On the “probable error” of a coefficient of correlation deduced from a small sample. Metron, 1, 3–32.
Hartung, J., Argaç, D., & Makambi, K. H. (2002). Small sample properties of tests on homogeneity in one-way anova and meta-analysis. Statistical Papers, 43(2), 197–235. doi:10.1007/s00362-002-0097-8
how2stats (Director). (2018, June 11). Welch’s F-test vs Brown-Forsythe F-test: Which Should You Use and When? https://youtu.be/jteKmatBgF8
Schneider, P. J., & Penfield, D. A. (1997). Alexander and Govern’s approximation: Providing an alternative to ANOVA under variance heterogeneity. The Journal of Experimental Education, 65(3), 271–286. doi:10.1080/00220973.1997.9943459
Tomarken, A. J., & Serlin, R. C. (1986). Comparison of ANOVA alternatives under variance heterogeneity and specific noncentrality structures. Psychological Bulletin, 99(1), 90–99. doi:10.1037/0033-2909.99.1.90
Yiğit, E., & Gökpinar, F. (2010). A simulation study on tests for one-way ANOVA under the unequal variance assumption. Communications, Faculty Of Science, University of Ankara, 15–34. doi:10.1501/Commua1_0000000660
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_fisher_owa(nomField, scaleField, categories=None): ''' Fisher/Classic One-Way ANOVA / F-Test ---------------------- Tests if the means (averages) of each category could be the same in the population. If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population. There are quite some alternatives for this, the stikpet library has Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes for some discussion on the differences. Parameters ---------- nomField : pandas series data with categories scaleField : pandas series data with the scores categories : list, optional the categories to use from catField Returns ------- Dataframe with an ANOVA table showing: * *variance*, which variance is shown in that row * *SS*, sum of squared deviations from the mean * *df*, degrees of freedom * *MS*, the mean square * *F*, the F-statistic value * *p-value*, the p-value (significance) Notes ----- The formula used is: $$F_{Fisher} = \\frac{MS_b}{MS_w}$$ $$df_b = k - 1$$ $$df_w = n - k$$ $$sig. = 1 - F\\left(F_{Fisher}, df_b, df_w\\right)$$ With: $$MS_b = \\frac{SS_b}{df_b}$$ $$MS_w = \\frac{SS_w}{df_w}$$ $$SS_b = \\sum_{j=1}^k n_j\\times\\left(\\bar{x}_j - \\bar{x}\\right)^2$$ $$SS_w = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}_j\\right)^2$$ $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$ $$\\bar{x} = \\frac{\\sum_{j=1}^k n_j \\times \\bar{x}_j}{n} = \\frac{\\sum_{j=1}^k \\sum_{i=1}^{n_j} x_{i,j}}{n}$$ $$n = \\sum_{j=1}^k n_j$$ *Symbols used:* * \\(x_{i,j}\\), the i-th score in category j * \\(n\\), the total sample size * \\(n_j\\), the number of scores in category j * \\(k\\), the number of categories * \\(\\bar{x}_j\\), the mean of the scores in category j * \\(MS_i\\), the mean square of i * \\(SS_i\\), the sum of squares of i (sum of squared deviation of the mean) * \\(df_i\\), the degrees of freedom of i * \\(b\\), is between = factor = treatment = model * \\(w\\), is within = error (the variability within the groups) The formula most likely originated from Fisher (1921). **Choosing a test** The classic/Fisher one-way ANOVA assumes the data is normally distributed and that the variances in each group are the same in the population (homoscedasticity). Many have tried to cover the situations when one or both of these conditions are not met. Delacre et al. (2019) recommend to use the Welch ANOVA instead of the classic and Brown-Forsythe versions. How2stats (2018) give a slightly different recommendation based on Tomarken and Serlin (1986). They agree that usually the Welch ANOVA is preferred of the classic version, but if the average sample size is below six to still use the Brown-Forsythe. The researchers in the previous paragraph did not take into consideration other approaches. A few comments found on those other methods. According to Hartung et al. (2002, p. 225) the Cochran test is the standard test in meta-analysis, but should not be used, since it is always too liberal. Schneider and Penfield (1997) looked at the Welch, Alexander-Govern and the James test (they ignored the Brown-Forsythe since they found it to perform worse than Welch or James), and concluded: “Under variance heterogeneity, Alexander-Govern’s approximation was not only comparable to the Welch test and the James second-order test but was superior, in certain instances, when coupled with the power results for those tests” (p. 285). Cavus and Yazici (2020) compared many different tests. They showed that the Brown-Forsythe, Box correction, Cochran, Hartung-Agac-Makabi adjusted Welch, and Scott-Smith test, all do not perform well, compared to the Asiribo-Gurland correction, Alexander-Govern test, Özdemir-Kurt B2, Mehrotra modified Brown-Forsythe, and Welch. I only came across the Johansen test in Algina et. al. (1991) and it appears to give the same results as the Welch test. In my experience the one-way ANOVA is widely known and often discussed in textbooks. The Welch anova is gaining popularity. The Brown-Forsythe is already more obscure and some confuse it with the Brown-Forsythe test for variances. The James test and the Alexander-Govern are perhaps the least known and the Johansen even less than that (at least they were for me). So, although the Alexander-Govern test might be preferred over the Welch test, some researchers prefer to use a more commonly used test than a more obscure version. In the end it is up to you to decide on what might be the best test, and also depending on the importance of your research you might want to investigate which test fits your situation best, rather than taking my word for it. Besides these, there are more methods, some using simulation (bootstrapping) (see Cavus and Yazici (2020) for a few of them), others using different techniques (see Yiğit and Gökpinar (2010) for a few more methods not in here). References ---------- Algina, J., Oshima, T. C., & Tang, K. L. (1991). Robustness of Yao’s, James’, and Johansen’s Tests under variance-covariance heteroscedasticity and nonnormality. *Journal of Educational Statistics, 16*(2), 125–139. doi:10.2307/1165116 Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008 Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. *International Review of Social Psychology, 32*(1), 1–12. doi:10.5334/irsp.198 Fisher, R. A. (1921). On the “probable error” of a coefficient of correlation deduced from a small sample. *Metron, 1*, 3–32. Hartung, J., Argaç, D., & Makambi, K. H. (2002). Small sample properties of tests on homogeneity in one-way anova and meta-analysis. *Statistical Papers, 43*(2), 197–235. doi:10.1007/s00362-002-0097-8 how2stats (Director). (2018, June 11). Welch’s F-test vs Brown-Forsythe F-test: Which Should You Use and When? https://youtu.be/jteKmatBgF8 Schneider, P. J., & Penfield, D. A. (1997). Alexander and Govern’s approximation: Providing an alternative to ANOVA under variance heterogeneity. *The Journal of Experimental Education, 65*(3), 271–286. doi:10.1080/00220973.1997.9943459 Tomarken, A. J., & Serlin, R. C. (1986). Comparison of ANOVA alternatives under variance heterogeneity and specific noncentrality structures. *Psychological Bulletin, 99*(1), 90–99. doi:10.1037/0033-2909.99.1.90 Yiğit, E., & Gökpinar, F. (2010). A simulation study on tests for one-way ANOVA under the unequal variance assumption. Communications, Faculty Of Science, University of Ankara, 15–34. doi:10.1501/Commua1_0000000660 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' if type(nomField) == list: nomField = pd.Series(nomField) if type(scaleField) == list: scaleField = pd.Series(scaleField) data = pd.concat([nomField, scaleField], axis=1) data.columns = ["category", "score"] #remove unused categories if categories is not None: data = data[data.category.isin(categories)] #Remove rows with missing values and reset index data = data.dropna() data.reset_index() #overall n, mean and ss n = len(data["category"]) m = data.score.mean() sst = data.score.var()*(n-1) #sample sizes, and means per category nj = data.groupby('category').count() sj = data.groupby('category').sum() mj = data.groupby('category').mean() #number of categories k = len(mj) ssb = float((nj*(mj-m)**2).sum()) ssw = sst - ssb dfb = k - 1 dfw = n - k dft = n - 1 msb = ssb/dfb msw = ssw/dfw fVal = msb/msw p = f.sf(fVal, dfb, dfw) res = pd.DataFrame() res["variance"] = ["between", "within", "total"] res["SS"] = [ssb, ssw, sst] res["df"] = [dfb, dfw, dft] res["MS"] = [msb, msw, None] res["F"] = [fVal, None, None] res["p-value"] = [p, None, None] return res