Module stikpetP.effect_sizes.eff_size_eta_sq
Expand source code
import pandas as pd
from ..other.table_cross import tab_cross
def es_eta_sq(catField, ordField, categories=None, levels=None, useRanks=False):
'''
Eta Squared
---------------
An effect size measure to indicate the the strength of the categories on the ordinal/scale field. A 0 would indicate no influence, and 1 a perfect relationship.
It is “the proportion of the variation in Y that is associated with membership of the different groups defined by X “ (Richardson, 2011, p. 136).
An alternative Epsilon Squared is an attempt to make eta-squared unbiased (applying a population correction ratio) (Kelley, 1935, p. 557). Although a popular belief is that omega-squared is preferred over epsilon-squared (Keselman, 1975), a later study actually showed that epsilon-squared might be preferred (Okada, 2013).
Tomczak and Tomczak (2014) recommend this this as one option to be used with a Kruskal-Wallis test, however I think they labelled epsilon-squared as eta-squared and the other way around.
Parameters
----------
catField : pandas series
data with categories
ordField : pandas series
data with the numeric scores
categories : list or dictionary, optional
the categories to use from catField
levels : list or dictionary, optional
the levels or order used in ordField
useRanks : Boolean, optional
use ranks or use the scores as given in ordfield. Default is False.
Returns
-------
etaSq : float
the eta squared value
Notes
-----
The formula used (Pearson, 1911, p. 254):
$$\\eta^2 = \\frac{SS_b}{SS_t}$$
With:
$$SS_t = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}\\right)^2$$
$$SS_b = \\sum_{j=1}^k n_j \\times \\left(\\bar{x}_{j} - \\bar{x}\\right)^2$$
$$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$
$$\\bar{x} = \\frac{\\sum_{j=1}^k n_j\\times\\bar{x}_j}{n} = \\frac{\\sum_{j=1}^k \\sum_{i=1}^{n_j} x_{i,j}}{n}$$
$$n = \\sum_{j=1}^k n_j$$
Alternative formulas, but with same result, include:
$$\\eta^2 = \\frac{F\\times\\left(k - 1\\right)}{F\\times\\left(k - 1\\right)+n-k}$$
$$\\eta^2 = \\frac{F\\times df_b}{F\\times df_b+df_w}$$
If ranks are used, the eta-squared can also be determined using (Tomczak & Tomczak, 2014, p. 24):
$$\\eta^2 = \\frac{H}{n - 1}$$
*Symbols used:*
* \\(n\\), the total sample size
* \\(k\\), the number of categories
* \\(SS_b\\), the between sum of squares (sum of squared deviation of the mean)
* \\(SS_t\\), the total sum of squares (sum of squared deviation of the mean)
* \\(F\\), the F-statistic
* \\(H\\), H-statistic from Kruskal-Wallis H-test
* \\(df_i\\), the degrees of freedom of i
* \\(x_{i,j}\\), the i-th score in category j
* \\(n_j\\), the number of scores in category j
* \\(\\bar{x}_j\\), the mean of the scores in category j
* \\(b\\), is between = factor = treatment = model
* \\(w\\), is within = error (the variability within the groups)
References
----------
Kelley, T. L. (1935). An unbiased correlation ratio measure. *Proceedings of the National Academy of Sciences, 21*(9), 554–559. doi:10.1073/pnas.21.9.554
Keselman, H. J. (1975). A Monte Carlo investigation of three estimates of treatment magnitude: Epsilon squared, eta squared, and omega squared. *Canadian Psychological Review / Psychologie Canadienne, 16*(1), 44–48. doi:10.1037/h0081789
Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices in one-way anova. *Behaviormetrika, 40*(2), 129–147. doi:10.2333/bhmk.40.129
Pearson, K. (1911). On a correction to be made to the correlation ratio η. *Biometrika, 8*(1/2), 254. doi:10.2307/2331454
Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. *Educational Research Review, 6*(2), 135–147. doi:10.1016/j.edurev.2010.12.001
Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. *Trends in Sport Sciences, 1*(21), 19–25.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
#create the cross table
ct = tab_cross(ordField, catField, order1=levels, order2=categories, totals="include")
#basic counts
k = ct.shape[1]-1
nlvl = ct.shape[0]-1
n = ct.iloc[nlvl, k]
lvlRank = pd.Series(dtype="object")
cf = 0
for i in range(0, nlvl):
if useRanks:
lvlRank.at[i] = (2 * cf + ct.iloc[i, k] + 1) / 2
cf = cf + ct.iloc[i, k]
else:
lvlRank.at[i] = ct.index[i]
#sum of ranks per category
srj = pd.Series(dtype="object")
mrj = pd.Series(dtype="object")
n = 0
mr = 0
for j in range(0, k):
sr = 0
for i in range(0, nlvl):
sr = sr + ct.iloc[i, j] * lvlRank.iloc[i]
srj.at[j] = sr
mrj.at[j] = sr / ct.iloc[nlvl, j]
mr = mr + sr
n = n + ct.iloc[nlvl, j]
mr = mr / n
#ss between
ssb = 0
for j in range(0, k):
ssb = ssb + ct.iloc[nlvl, j] * (mrj.iloc[j] - mr)**2
#ss total
sst = 0
for i in range(0, nlvl):
for j in range(0, k):
sst = sst + ct.iloc[i, j] * (lvlRank.iloc[i] - mr)**2
#results
etaSq = ssb / sst
return etaSq
Functions
def es_eta_sq(catField, ordField, categories=None, levels=None, useRanks=False)-
Eta Squared
An effect size measure to indicate the the strength of the categories on the ordinal/scale field. A 0 would indicate no influence, and 1 a perfect relationship.
It is “the proportion of the variation in Y that is associated with membership of the different groups defined by X “ (Richardson, 2011, p. 136).
An alternative Epsilon Squared is an attempt to make eta-squared unbiased (applying a population correction ratio) (Kelley, 1935, p. 557). Although a popular belief is that omega-squared is preferred over epsilon-squared (Keselman, 1975), a later study actually showed that epsilon-squared might be preferred (Okada, 2013).
Tomczak and Tomczak (2014) recommend this this as one option to be used with a Kruskal-Wallis test, however I think they labelled epsilon-squared as eta-squared and the other way around.
Parameters
catField:pandas series- data with categories
ordField:pandas series- data with the numeric scores
categories:listordictionary, optional- the categories to use from catField
levels:listordictionary, optional- the levels or order used in ordField
useRanks:Boolean, optional- use ranks or use the scores as given in ordfield. Default is False.
Returns
etaSq:float- the eta squared value
Notes
The formula used (Pearson, 1911, p. 254): \eta^2 = \frac{SS_b}{SS_t}
With: SS_t = \sum_{j=1}^k \sum_{i=1}^{n_j} \left(x_{i,j} - \bar{x}\right)^2 SS_b = \sum_{j=1}^k n_j \times \left(\bar{x}_{j} - \bar{x}\right)^2 \bar{x}_j = \frac{\sum_{i=1}^{n_j} x_{i,j}}{n_j} \bar{x} = \frac{\sum_{j=1}^k n_j\times\bar{x}_j}{n} = \frac{\sum_{j=1}^k \sum_{i=1}^{n_j} x_{i,j}}{n} n = \sum_{j=1}^k n_j
Alternative formulas, but with same result, include: \eta^2 = \frac{F\times\left(k - 1\right)}{F\times\left(k - 1\right)+n-k} \eta^2 = \frac{F\times df_b}{F\times df_b+df_w}
If ranks are used, the eta-squared can also be determined using (Tomczak & Tomczak, 2014, p. 24): \eta^2 = \frac{H}{n - 1}
Symbols used:
- n, the total sample size
- k, the number of categories
- SS_b, the between sum of squares (sum of squared deviation of the mean)
- SS_t, the total sum of squares (sum of squared deviation of the mean)
- F, the F-statistic
- H, H-statistic from Kruskal-Wallis H-test
- df_i, the degrees of freedom of i
- x_{i,j}, the i-th score in category j
- n_j, the number of scores in category j
- \bar{x}_j, the mean of the scores in category j
- b, is between = factor = treatment = model
- w, is within = error (the variability within the groups)
References
Kelley, T. L. (1935). An unbiased correlation ratio measure. Proceedings of the National Academy of Sciences, 21(9), 554–559. doi:10.1073/pnas.21.9.554
Keselman, H. J. (1975). A Monte Carlo investigation of three estimates of treatment magnitude: Epsilon squared, eta squared, and omega squared. Canadian Psychological Review / Psychologie Canadienne, 16(1), 44–48. doi:10.1037/h0081789
Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices in one-way anova. Behaviormetrika, 40(2), 129–147. doi:10.2333/bhmk.40.129
Pearson, K. (1911). On a correction to be made to the correlation ratio η. Biometrika, 8(1/2), 254. doi:10.2307/2331454
Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6(2), 135–147. doi:10.1016/j.edurev.2010.12.001
Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sport Sciences, 1(21), 19–25.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def es_eta_sq(catField, ordField, categories=None, levels=None, useRanks=False): ''' Eta Squared --------------- An effect size measure to indicate the the strength of the categories on the ordinal/scale field. A 0 would indicate no influence, and 1 a perfect relationship. It is “the proportion of the variation in Y that is associated with membership of the different groups defined by X “ (Richardson, 2011, p. 136). An alternative Epsilon Squared is an attempt to make eta-squared unbiased (applying a population correction ratio) (Kelley, 1935, p. 557). Although a popular belief is that omega-squared is preferred over epsilon-squared (Keselman, 1975), a later study actually showed that epsilon-squared might be preferred (Okada, 2013). Tomczak and Tomczak (2014) recommend this this as one option to be used with a Kruskal-Wallis test, however I think they labelled epsilon-squared as eta-squared and the other way around. Parameters ---------- catField : pandas series data with categories ordField : pandas series data with the numeric scores categories : list or dictionary, optional the categories to use from catField levels : list or dictionary, optional the levels or order used in ordField useRanks : Boolean, optional use ranks or use the scores as given in ordfield. Default is False. Returns ------- etaSq : float the eta squared value Notes ----- The formula used (Pearson, 1911, p. 254): $$\\eta^2 = \\frac{SS_b}{SS_t}$$ With: $$SS_t = \\sum_{j=1}^k \\sum_{i=1}^{n_j} \\left(x_{i,j} - \\bar{x}\\right)^2$$ $$SS_b = \\sum_{j=1}^k n_j \\times \\left(\\bar{x}_{j} - \\bar{x}\\right)^2$$ $$\\bar{x}_j = \\frac{\\sum_{i=1}^{n_j} x_{i,j}}{n_j}$$ $$\\bar{x} = \\frac{\\sum_{j=1}^k n_j\\times\\bar{x}_j}{n} = \\frac{\\sum_{j=1}^k \\sum_{i=1}^{n_j} x_{i,j}}{n}$$ $$n = \\sum_{j=1}^k n_j$$ Alternative formulas, but with same result, include: $$\\eta^2 = \\frac{F\\times\\left(k - 1\\right)}{F\\times\\left(k - 1\\right)+n-k}$$ $$\\eta^2 = \\frac{F\\times df_b}{F\\times df_b+df_w}$$ If ranks are used, the eta-squared can also be determined using (Tomczak & Tomczak, 2014, p. 24): $$\\eta^2 = \\frac{H}{n - 1}$$ *Symbols used:* * \\(n\\), the total sample size * \\(k\\), the number of categories * \\(SS_b\\), the between sum of squares (sum of squared deviation of the mean) * \\(SS_t\\), the total sum of squares (sum of squared deviation of the mean) * \\(F\\), the F-statistic * \\(H\\), H-statistic from Kruskal-Wallis H-test * \\(df_i\\), the degrees of freedom of i * \\(x_{i,j}\\), the i-th score in category j * \\(n_j\\), the number of scores in category j * \\(\\bar{x}_j\\), the mean of the scores in category j * \\(b\\), is between = factor = treatment = model * \\(w\\), is within = error (the variability within the groups) References ---------- Kelley, T. L. (1935). An unbiased correlation ratio measure. *Proceedings of the National Academy of Sciences, 21*(9), 554–559. doi:10.1073/pnas.21.9.554 Keselman, H. J. (1975). A Monte Carlo investigation of three estimates of treatment magnitude: Epsilon squared, eta squared, and omega squared. *Canadian Psychological Review / Psychologie Canadienne, 16*(1), 44–48. doi:10.1037/h0081789 Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices in one-way anova. *Behaviormetrika, 40*(2), 129–147. doi:10.2333/bhmk.40.129 Pearson, K. (1911). On a correction to be made to the correlation ratio η. *Biometrika, 8*(1/2), 254. doi:10.2307/2331454 Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. *Educational Research Review, 6*(2), 135–147. doi:10.1016/j.edurev.2010.12.001 Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. *Trends in Sport Sciences, 1*(21), 19–25. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' #create the cross table ct = tab_cross(ordField, catField, order1=levels, order2=categories, totals="include") #basic counts k = ct.shape[1]-1 nlvl = ct.shape[0]-1 n = ct.iloc[nlvl, k] lvlRank = pd.Series(dtype="object") cf = 0 for i in range(0, nlvl): if useRanks: lvlRank.at[i] = (2 * cf + ct.iloc[i, k] + 1) / 2 cf = cf + ct.iloc[i, k] else: lvlRank.at[i] = ct.index[i] #sum of ranks per category srj = pd.Series(dtype="object") mrj = pd.Series(dtype="object") n = 0 mr = 0 for j in range(0, k): sr = 0 for i in range(0, nlvl): sr = sr + ct.iloc[i, j] * lvlRank.iloc[i] srj.at[j] = sr mrj.at[j] = sr / ct.iloc[nlvl, j] mr = mr + sr n = n + ct.iloc[nlvl, j] mr = mr / n #ss between ssb = 0 for j in range(0, k): ssb = ssb + ct.iloc[nlvl, j] * (mrj.iloc[j] - mr)**2 #ss total sst = 0 for i in range(0, nlvl): for j in range(0, k): sst = sst + ct.iloc[i, j] * (lvlRank.iloc[i] - mr)**2 #results etaSq = ssb / sst return etaSq