Module stikpetP.effect_sizes.eff_size_eta_sq

Expand source code
import pandas as pd
from ..other.table_cross import tab_cross

def es_eta_sq(catField, ordField, categories=None, levels=None, useRanks=False):
    '''
    Eta Squared
    ---------------
    An effect size measure to indicate the the strength of the categories on the ordinal/scale field. A 0 would indicate no influence, and 1 a perfect relationship.
    
    It is “the proportion of the variation in Y that is associated with membership of the different groups defined by X “ (Richardson, 2011, p. 136).
    
    An alternative Epsilon Squared is an attempt to make eta-squared unbiased (applying a population correction ratio) (Kelley, 1935, p. 557). Although a popular belief is that omega-squared is preferred over epsilon-squared (Keselman, 1975), a later study actually showed that epsilon-squared might be preferred (Okada, 2013).
    
    Tomczak and Tomczak (2014) recommend this this as one option to be used with a Kruskal-Wallis test, however I think they labelled epsilon-squared as eta-squared and the other way around.
    
    Parameters
    ----------
    catField : pandas series
        data with categories
    ordField : pandas series
        data with the numeric scores
    categories : list or dictionary, optional
        the categories to use from catField
    levels : list or dictionary, optional
        the levels or order used in ordField
    useRanks : Boolean, optional
        use ranks or use the scores as given in ordfield. Default is False.
        
    Returns
    -------
    etaSq : float
        the eta squared value
        
    Notes
    -----
    The formula used (Pearson, 1911, p. 254):
    $$\\eta^2 = \\frac{SS_b}{SS_t}$$
    
    With:
    $$SS_t = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}\\right)^2$$
    $$SS_b = \\sum_{j=1}^k n_j \\times \\left(\\bar{x}_{j} - \\bar{x}\\right)^2$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    $$\\bar{x} = \\frac{\\sum_{j=1}^k n_j\\times\\bar{x}_j}{n} = \\frac{\\sum_{j=1}^k \\sum_{i=1}^{n_j} x_{i,j}}{n}$$
    $$n = \\sum_{j=1}^k n_j$$
    
    Alternative formulas, but with same result, include:
    $$\\eta^2 = \\frac{F\\times\\left(k - 1\\right)}{F\\times\\left(k - 1\\right)+n-k}$$
    $$\\eta^2 = \\frac{F\\times df_b}{F\\times df_b+df_w}$$
    
    If ranks are used, the eta-squared can also be determined using (Tomczak & Tomczak, 2014, p. 24):
    $$\\eta^2 = \\frac{H}{n - 1}$$
    
    *Symbols used:* 
    
    * \\(n\\), the total sample size
    * \\(k\\), the number of categories
    * \\(SS_b\\), the between sum of squares (sum of squared deviation of the mean)
    * \\(SS_t\\), the total sum of squares (sum of squared deviation of the mean)
    * \\(F\\), the F-statistic
    * \\(H\\), H-statistic from Kruskal-Wallis H-test
    * \\(df_i\\), the degrees of freedom of i
    * \\(x_{i,j}\\), the i-th score in category j
    * \\(n_j\\), the number of scores in category j
    * \\(\\bar{x}_j\\), the mean of the scores in category j
    * \\(b\\), is between = factor = treatment = model
    * \\(w\\), is within = error (the variability within the groups)
    
    References
    ----------
    Kelley, T. L. (1935). An unbiased correlation ratio measure. *Proceedings of the National Academy of Sciences, 21*(9), 554–559. doi:10.1073/pnas.21.9.554
    
    Keselman, H. J. (1975). A Monte Carlo investigation of three estimates of treatment magnitude: Epsilon squared, eta squared, and omega squared. *Canadian Psychological Review / Psychologie Canadienne, 16*(1), 44–48. doi:10.1037/h0081789
    
    Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices in one-way anova. *Behaviormetrika, 40*(2), 129–147. doi:10.2333/bhmk.40.129
    
    Pearson, K. (1911). On a correction to be made to the correlation ratio η. *Biometrika, 8*(1/2), 254. doi:10.2307/2331454
    
    Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. *Educational Research Review, 6*(2), 135–147. doi:10.1016/j.edurev.2010.12.001
    
    Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. *Trends in Sport Sciences, 1*(21), 19–25.

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    #create the cross table    
    ct = tab_cross(ordField, catField, order1=levels, order2=categories, totals="include")
    
    #basic counts
    k = ct.shape[1]-1
    nlvl = ct.shape[0]-1
    n = ct.iloc[nlvl, k]
    
    lvlRank = pd.Series(dtype="object")
    cf = 0
    for i in range(0, nlvl):
        if useRanks:
            lvlRank.at[i] = (2 * cf + ct.iloc[i, k] + 1) / 2
            cf = cf + ct.iloc[i, k]
        else:
            lvlRank.at[i] = ct.index[i]
    
    #sum of ranks per category
    srj = pd.Series(dtype="object")
    mrj = pd.Series(dtype="object")
    n = 0
    mr = 0
    for j in range(0, k):
        sr = 0
        for i in range(0, nlvl):
             sr = sr + ct.iloc[i, j] * lvlRank.iloc[i]
        srj.at[j] = sr
        mrj.at[j] = sr / ct.iloc[nlvl, j]
        mr = mr + sr
        n = n + ct.iloc[nlvl, j]
    
    mr = mr / n
    
    #ss between
    ssb = 0
    for j in range(0, k):
        ssb = ssb + ct.iloc[nlvl, j] * (mrj.iloc[j] - mr)**2
    
    #ss total
    sst = 0
    for i in range(0, nlvl):
        for j in range(0, k):
            sst = sst + ct.iloc[i, j] * (lvlRank.iloc[i] - mr)**2
    
    #results
    etaSq = ssb / sst

    return etaSq

Functions

def es_eta_sq(catField, ordField, categories=None, levels=None, useRanks=False)

Eta Squared

An effect size measure to indicate the the strength of the categories on the ordinal/scale field. A 0 would indicate no influence, and 1 a perfect relationship.

It is “the proportion of the variation in Y that is associated with membership of the different groups defined by X “ (Richardson, 2011, p. 136).

An alternative Epsilon Squared is an attempt to make eta-squared unbiased (applying a population correction ratio) (Kelley, 1935, p. 557). Although a popular belief is that omega-squared is preferred over epsilon-squared (Keselman, 1975), a later study actually showed that epsilon-squared might be preferred (Okada, 2013).

Tomczak and Tomczak (2014) recommend this this as one option to be used with a Kruskal-Wallis test, however I think they labelled epsilon-squared as eta-squared and the other way around.

Parameters

catField : pandas series
data with categories
ordField : pandas series
data with the numeric scores
categories : list or dictionary, optional
the categories to use from catField
levels : list or dictionary, optional
the levels or order used in ordField
useRanks : Boolean, optional
use ranks or use the scores as given in ordfield. Default is False.

Returns

etaSq : float
the eta squared value

Notes

The formula used (Pearson, 1911, p. 254): \eta^2 = \frac{SS_b}{SS_t}

With: SS_t = \sum_{j=1}^k \sum_{i=1}^{n_j} \left(x_{i,j} - \bar{x}\right)^2 SS_b = \sum_{j=1}^k n_j \times \left(\bar{x}_{j} - \bar{x}\right)^2 \bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j} \bar{x} = \frac{\sum_{j=1}^k n_j\times\bar{x}_j}{n} = \frac{\sum_{j=1}^k \sum_{i=1}^{n_j} x_{i,j}}{n} n = \sum_{j=1}^k n_j

Alternative formulas, but with same result, include: \eta^2 = \frac{F\times\left(k - 1\right)}{F\times\left(k - 1\right)+n-k} \eta^2 = \frac{F\times df_b}{F\times df_b+df_w}

If ranks are used, the eta-squared can also be determined using (Tomczak & Tomczak, 2014, p. 24): \eta^2 = \frac{H}{n - 1}

Symbols used:

  • n, the total sample size
  • k, the number of categories
  • SS_b, the between sum of squares (sum of squared deviation of the mean)
  • SS_t, the total sum of squares (sum of squared deviation of the mean)
  • F, the F-statistic
  • H, H-statistic from Kruskal-Wallis H-test
  • df_i, the degrees of freedom of i
  • x_{i,j}, the i-th score in category j
  • n_j, the number of scores in category j
  • \bar{x}_j, the mean of the scores in category j
  • b, is between = factor = treatment = model
  • w, is within = error (the variability within the groups)

References

Kelley, T. L. (1935). An unbiased correlation ratio measure. Proceedings of the National Academy of Sciences, 21(9), 554–559. doi:10.1073/pnas.21.9.554

Keselman, H. J. (1975). A Monte Carlo investigation of three estimates of treatment magnitude: Epsilon squared, eta squared, and omega squared. Canadian Psychological Review / Psychologie Canadienne, 16(1), 44–48. doi:10.1037/h0081789

Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices in one-way anova. Behaviormetrika, 40(2), 129–147. doi:10.2333/bhmk.40.129

Pearson, K. (1911). On a correction to be made to the correlation ratio η. Biometrika, 8(1/2), 254. doi:10.2307/2331454

Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6(2), 135–147. doi:10.1016/j.edurev.2010.12.001

Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sport Sciences, 1(21), 19–25.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code
def es_eta_sq(catField, ordField, categories=None, levels=None, useRanks=False):
    '''
    Eta Squared
    ---------------
    An effect size measure to indicate the the strength of the categories on the ordinal/scale field. A 0 would indicate no influence, and 1 a perfect relationship.
    
    It is “the proportion of the variation in Y that is associated with membership of the different groups defined by X “ (Richardson, 2011, p. 136).
    
    An alternative Epsilon Squared is an attempt to make eta-squared unbiased (applying a population correction ratio) (Kelley, 1935, p. 557). Although a popular belief is that omega-squared is preferred over epsilon-squared (Keselman, 1975), a later study actually showed that epsilon-squared might be preferred (Okada, 2013).
    
    Tomczak and Tomczak (2014) recommend this this as one option to be used with a Kruskal-Wallis test, however I think they labelled epsilon-squared as eta-squared and the other way around.
    
    Parameters
    ----------
    catField : pandas series
        data with categories
    ordField : pandas series
        data with the numeric scores
    categories : list or dictionary, optional
        the categories to use from catField
    levels : list or dictionary, optional
        the levels or order used in ordField
    useRanks : Boolean, optional
        use ranks or use the scores as given in ordfield. Default is False.
        
    Returns
    -------
    etaSq : float
        the eta squared value
        
    Notes
    -----
    The formula used (Pearson, 1911, p. 254):
    $$\\eta^2 = \\frac{SS_b}{SS_t}$$
    
    With:
    $$SS_t = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}\\right)^2$$
    $$SS_b = \\sum_{j=1}^k n_j \\times \\left(\\bar{x}_{j} - \\bar{x}\\right)^2$$
    $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
    $$\\bar{x} = \\frac{\\sum_{j=1}^k n_j\\times\\bar{x}_j}{n} = \\frac{\\sum_{j=1}^k \\sum_{i=1}^{n_j} x_{i,j}}{n}$$
    $$n = \\sum_{j=1}^k n_j$$
    
    Alternative formulas, but with same result, include:
    $$\\eta^2 = \\frac{F\\times\\left(k - 1\\right)}{F\\times\\left(k - 1\\right)+n-k}$$
    $$\\eta^2 = \\frac{F\\times df_b}{F\\times df_b+df_w}$$
    
    If ranks are used, the eta-squared can also be determined using (Tomczak & Tomczak, 2014, p. 24):
    $$\\eta^2 = \\frac{H}{n - 1}$$
    
    *Symbols used:* 
    
    * \\(n\\), the total sample size
    * \\(k\\), the number of categories
    * \\(SS_b\\), the between sum of squares (sum of squared deviation of the mean)
    * \\(SS_t\\), the total sum of squares (sum of squared deviation of the mean)
    * \\(F\\), the F-statistic
    * \\(H\\), H-statistic from Kruskal-Wallis H-test
    * \\(df_i\\), the degrees of freedom of i
    * \\(x_{i,j}\\), the i-th score in category j
    * \\(n_j\\), the number of scores in category j
    * \\(\\bar{x}_j\\), the mean of the scores in category j
    * \\(b\\), is between = factor = treatment = model
    * \\(w\\), is within = error (the variability within the groups)
    
    References
    ----------
    Kelley, T. L. (1935). An unbiased correlation ratio measure. *Proceedings of the National Academy of Sciences, 21*(9), 554–559. doi:10.1073/pnas.21.9.554
    
    Keselman, H. J. (1975). A Monte Carlo investigation of three estimates of treatment magnitude: Epsilon squared, eta squared, and omega squared. *Canadian Psychological Review / Psychologie Canadienne, 16*(1), 44–48. doi:10.1037/h0081789
    
    Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices in one-way anova. *Behaviormetrika, 40*(2), 129–147. doi:10.2333/bhmk.40.129
    
    Pearson, K. (1911). On a correction to be made to the correlation ratio η. *Biometrika, 8*(1/2), 254. doi:10.2307/2331454
    
    Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. *Educational Research Review, 6*(2), 135–147. doi:10.1016/j.edurev.2010.12.001
    
    Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. *Trends in Sport Sciences, 1*(21), 19–25.

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    '''
    #create the cross table    
    ct = tab_cross(ordField, catField, order1=levels, order2=categories, totals="include")
    
    #basic counts
    k = ct.shape[1]-1
    nlvl = ct.shape[0]-1
    n = ct.iloc[nlvl, k]
    
    lvlRank = pd.Series(dtype="object")
    cf = 0
    for i in range(0, nlvl):
        if useRanks:
            lvlRank.at[i] = (2 * cf + ct.iloc[i, k] + 1) / 2
            cf = cf + ct.iloc[i, k]
        else:
            lvlRank.at[i] = ct.index[i]
    
    #sum of ranks per category
    srj = pd.Series(dtype="object")
    mrj = pd.Series(dtype="object")
    n = 0
    mr = 0
    for j in range(0, k):
        sr = 0
        for i in range(0, nlvl):
             sr = sr + ct.iloc[i, j] * lvlRank.iloc[i]
        srj.at[j] = sr
        mrj.at[j] = sr / ct.iloc[nlvl, j]
        mr = mr + sr
        n = n + ct.iloc[nlvl, j]
    
    mr = mr / n
    
    #ss between
    ssb = 0
    for j in range(0, k):
        ssb = ssb + ct.iloc[nlvl, j] * (mrj.iloc[j] - mr)**2
    
    #ss total
    sst = 0
    for i in range(0, nlvl):
        for j in range(0, k):
            sst = sst + ct.iloc[i, j] * (lvlRank.iloc[i] - mr)**2
    
    #results
    etaSq = ssb / sst

    return etaSq