Module stikpetP.tests.test_welch_owa

Expand source code
import pandas as pd
from scipy.stats import f

def ts_welch_owa(nomField, scaleField, categories=None):
    '''
    Welch One-Way ANOVA
    -------------------    
    Tests if the means (averages) of each category could be the same in the population.
    
    If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
    
    Delacre et al. (2019) recommend to use the Welch ANOVA instead of the classic and Brown-Forsythe versions, but there are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.
    
    Parameters
    ----------
    nomField : pandas series
        data with categories
    scaleField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
    
    Returns
    -------
    Dataframe with:
    
    * *n*, the sample size
    * *statistic*, the test statistic (F value)
    * *df1*, degrees of freedom
    * *df2*, degrees of freedom
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used (Welch, 1951, p. 334):
    $$F_{Welch}=\\frac{\\frac{1}{k-1}\\times\\sum_{j=1}^k w_j\\times\\left(\\bar{x}_j - \\bar{y}_w\\right)^2}{1+2\\times\\lambda\\times\\frac{k-2}{k^2-1}}$$
    $$df_1 = k - 1$$
    $$df_2 = \\frac{k^2 - 1}{3\\times \\lambda}$$
    $$sig. = 1 - F\\left(F_{Welch}, df_1, df_2\\right)$$
    
    With:
    $$\\lambda = \\sum_{j=1}^k\\frac{\\left(1-h_j\\right)^2}{n_j -1}$$
    $$\\bar{y}_w = \\sum_{j=1}^k h_j\\times\\bar{x}_j$$
    $$h_j = \\frac{w_j}{w}$$
    $$w = \\sum_{j=1}^k w_j$$
    $$w_j = \\frac{n_j}{s_j^2}$$
    $$s_j^2 = \\frac{\\sum_{i=1}^{n_j}\\left(x_{i,j} - \\bar{x}_j\\right)^2}{n_j - 1}$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\), the i-th score in category j
    * \\(k\\), the number of categories
    * \\(n_j\\), the sample size of category j
    * \\(\\bar{x}_j\\), the sample mean of category j
    * \\(s_j^2\\), the sample variance of the scores in category j
    * \\(h_j\\), the adjusted weight for category j
    * \\(w_j\\), the weight for category j
    * \\(h_j\\), the adjusted weight for category j
    * \\(df_i\\), the i-th degrees of freedom.    
    
    Cavus and Yazici (2020) make a difference between the Welch and the Welch-Aspin ANOVA. The only difference in the article is that with the Welch \\(2\\times\\left(k-2\\right)\\) is used, while in the Welch-Aspin version \\(2\\times k-2\\). I think this is a mistake in their formula, since the article they refer to from Aspin is about two means.
    
    Johansen F test (Johansen, 1980) will give the same results
    
    References
    ----------
    Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008

    Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. *International Review of Social Psychology, 32*(1), 1–12. doi:10.5334/irsp.198
    
    Johansen, S. (1980). The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. *Biometrika, 67*(1), 85–92. doi:10.1093/biomet/67.1.85
    
    Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. *Biometrika, 34*(1/2), 28–35. doi:10.2307/2332510
    
    Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. *Biometrika, 38*(3/4), 330–336. doi:10.2307/2332579
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    
    '''
    
    if type(nomField) == list:
        nomField = pd.Series(nomField)
        
    if type(scaleField) == list:
        scaleField = pd.Series(scaleField)
        
    data = pd.concat([nomField, scaleField], axis=1)
    data.columns = ["category", "score"]
    
    #remove unused categories
    if categories is not None:
        data = data[data.category.isin(categories)]
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n, mean and ss
    n = len(data["category"])
    m = data.score.mean()
    sst = data.score.var()*(n-1)
    
    #sample sizes, variances and means per category
    nj = data.groupby('category').count()
    sj2 = data.groupby('category').var()
    mj = data.groupby('category').mean()
    
    #number of categories
    k = len(mj)
    
    wj = nj / sj2
    w = float(wj.sum())
    hj = wj/w
    yw = float((hj*mj).sum())
    lm = float(((1 - hj)**2/(nj - 1)).sum())
    
    chi2Val = float((wj*(mj - yw)**2).sum())
    fVal = chi2Val / ((k - 1)+2*lm*(k - 2)/(k+1))
    
    df1 = k - 1
    df2 = (k**2 - 1)/(3*lm)
    pVal = f.sf(fVal, df1, df2)
    
    #results
    res = pd.DataFrame([[n, fVal, df1, df2, pVal]])
    res.columns = ["n", "statistic", "df1", "df2", "p-value"]
    
    return res

Functions

def ts_welch_owa(nomField, scaleField, categories=None)

Welch One-Way ANOVA

Tests if the means (averages) of each category could be the same in the population.

If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.

Delacre et al. (2019) recommend to use the Welch ANOVA instead of the classic and Brown-Forsythe versions, but there are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.

Parameters

nomField : pandas series
data with categories
scaleField : pandas series
data with the scores
categories : list or dictionary, optional
the categories to use from catField

Returns

Dataframe with:
 
  • n, the sample size
  • statistic, the test statistic (F value)
  • df1, degrees of freedom
  • df2, degrees of freedom
  • p-value, the p-value (significance)

Notes

The formula used (Welch, 1951, p. 334): F_{Welch}=\frac{\frac{1}{k-1}\times\sum_{j=1}^k w_j\times\left(\bar{x}_j - \bar{y}_w\right)^2}{1+2\times\lambda\times\frac{k-2}{k^2-1}} df_1 = k - 1 df_2 = \frac{k^2 - 1}{3\times \lambda} sig. = 1 - F\left(F_{Welch}, df_1, df_2\right)

With: \lambda = \sum_{j=1}^k\frac{\left(1-h_j\right)^2}{n_j -1} \bar{y}_w = \sum_{j=1}^k h_j\times\bar{x}_j h_j = \frac{w_j}{w} w = \sum_{j=1}^k w_j w_j = \frac{n_j}{s_j^2} s_j^2 = \frac{\sum_{i=1}^{n_j}\left(x_{i,j} - \bar{x}_j\right)^2}{n_j - 1} \bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j}

Symbols used:

  • x_{i,j}, the i-th score in category j
  • k, the number of categories
  • n_j, the sample size of category j
  • \bar{x}_j, the sample mean of category j
  • s_j^2, the sample variance of the scores in category j
  • h_j, the adjusted weight for category j
  • w_j, the weight for category j
  • h_j, the adjusted weight for category j
  • df_i, the i-th degrees of freedom.

Cavus and Yazici (2020) make a difference between the Welch and the Welch-Aspin ANOVA. The only difference in the article is that with the Welch 2\times\left(k-2\right) is used, while in the Welch-Aspin version 2\times k-2. I think this is a mistake in their formula, since the article they refer to from Aspin is about two means.

Johansen F test (Johansen, 1980) will give the same results

References

Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. The R Journal, 12(2), 134. doi:10.32614/RJ-2021-008

Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. International Review of Social Psychology, 32(1), 1–12. doi:10.5334/irsp.198

Johansen, S. (1980). The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. Biometrika, 67(1), 85–92. doi:10.1093/biomet/67.1.85

Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. Biometrika, 34(1/2), 28–35. doi:10.2307/2332510

Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38(3/4), 330–336. doi:10.2307/2332579

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def ts_welch_owa(nomField, scaleField, categories=None):
    '''
    Welch One-Way ANOVA
    -------------------    
    Tests if the means (averages) of each category could be the same in the population.
    
    If the p-value is below a pre-defined threshold (usually 0.05), the null hypothesis is rejected, and there are then at least two categories who will have a different mean on the scaleField score in the population.
    
    Delacre et al. (2019) recommend to use the Welch ANOVA instead of the classic and Brown-Forsythe versions, but there are quite some alternatives for this, the stikpet library has Fisher, Welch, James, Box, Scott-Smith, Brown-Forsythe, Alexander-Govern, Mehrotra modified Brown-Forsythe, Hartung-Agac-Makabi, Özdemir-Kurt and Wilcox as options. See the notes from ts_fisher_owa() for some discussion on the differences.
    
    Parameters
    ----------
    nomField : pandas series
        data with categories
    scaleField : pandas series
        data with the scores
    categories : list or dictionary, optional
        the categories to use from catField
    
    Returns
    -------
    Dataframe with:
    
    * *n*, the sample size
    * *statistic*, the test statistic (F value)
    * *df1*, degrees of freedom
    * *df2*, degrees of freedom
    * *p-value*, the p-value (significance)
    
    Notes
    -----
    The formula used (Welch, 1951, p. 334):
    $$F_{Welch}=\\frac{\\frac{1}{k-1}\\times\\sum_{j=1}^k w_j\\times\\left(\\bar{x}_j - \\bar{y}_w\\right)^2}{1+2\\times\\lambda\\times\\frac{k-2}{k^2-1}}$$
    $$df_1 = k - 1$$
    $$df_2 = \\frac{k^2 - 1}{3\\times \\lambda}$$
    $$sig. = 1 - F\\left(F_{Welch}, df_1, df_2\\right)$$
    
    With:
    $$\\lambda = \\sum_{j=1}^k\\frac{\\left(1-h_j\\right)^2}{n_j -1}$$
    $$\\bar{y}_w = \\sum_{j=1}^k h_j\\times\\bar{x}_j$$
    $$h_j = \\frac{w_j}{w}$$
    $$w = \\sum_{j=1}^k w_j$$
    $$w_j = \\frac{n_j}{s_j^2}$$
    $$s_j^2 = \\frac{\\sum_{i=1}^{n_j}\\left(x_{i,j} - \\bar{x}_j\\right)^2}{n_j - 1}$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\), the i-th score in category j
    * \\(k\\), the number of categories
    * \\(n_j\\), the sample size of category j
    * \\(\\bar{x}_j\\), the sample mean of category j
    * \\(s_j^2\\), the sample variance of the scores in category j
    * \\(h_j\\), the adjusted weight for category j
    * \\(w_j\\), the weight for category j
    * \\(h_j\\), the adjusted weight for category j
    * \\(df_i\\), the i-th degrees of freedom.    
    
    Cavus and Yazici (2020) make a difference between the Welch and the Welch-Aspin ANOVA. The only difference in the article is that with the Welch \\(2\\times\\left(k-2\\right)\\) is used, while in the Welch-Aspin version \\(2\\times k-2\\). I think this is a mistake in their formula, since the article they refer to from Aspin is about two means.
    
    Johansen F test (Johansen, 1980) will give the same results
    
    References
    ----------
    Cavus, M., & Yazıcı, B. (2020). Testing the equality of normal distributed and independent groups’ means under unequal variances by doex package. *The R Journal, 12*(2), 134. doi:10.32614/RJ-2021-008

    Delacre, M., Leys, C., Mora, Y. L., & Lakens, D. (2019). Taking parametric assumptions seriously: Arguments for the use of Welch’s F-test instead of the classical F-test in one-way ANOVA. *International Review of Social Psychology, 32*(1), 1–12. doi:10.5334/irsp.198
    
    Johansen, S. (1980). The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. *Biometrika, 67*(1), 85–92. doi:10.1093/biomet/67.1.85
    
    Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. *Biometrika, 34*(1/2), 28–35. doi:10.2307/2332510
    
    Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. *Biometrika, 38*(3/4), 330–336. doi:10.2307/2332579
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    
    '''
    
    if type(nomField) == list:
        nomField = pd.Series(nomField)
        
    if type(scaleField) == list:
        scaleField = pd.Series(scaleField)
        
    data = pd.concat([nomField, scaleField], axis=1)
    data.columns = ["category", "score"]
    
    #remove unused categories
    if categories is not None:
        data = data[data.category.isin(categories)]
    
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n, mean and ss
    n = len(data["category"])
    m = data.score.mean()
    sst = data.score.var()*(n-1)
    
    #sample sizes, variances and means per category
    nj = data.groupby('category').count()
    sj2 = data.groupby('category').var()
    mj = data.groupby('category').mean()
    
    #number of categories
    k = len(mj)
    
    wj = nj / sj2
    w = float(wj.sum())
    hj = wj/w
    yw = float((hj*mj).sum())
    lm = float(((1 - hj)**2/(nj - 1)).sum())
    
    chi2Val = float((wj*(mj - yw)**2).sum())
    fVal = chi2Val / ((k - 1)+2*lm*(k - 2)/(k+1))
    
    df1 = k - 1
    df2 = (k**2 - 1)/(3*lm)
    pVal = f.sf(fVal, df1, df2)
    
    #results
    res = pd.DataFrame([[n, fVal, df1, df2, pVal]])
    res.columns = ["n", "statistic", "df1", "df2", "p-value"]
    
    return res