Module stikpetP.tests.test_welch_t_is
Expand source code
from statistics import mean, variance
from scipy.stats import t
import pandas as pd
def ts_welch_t_is(catField, scaleField, categories=None, dmu=0, df_ver='ws'):
'''
Welch t Test (Independent Samples)
----------------------------------
A test to compare two means. The null hypothesis would be that the means of each category are equal in the population.
Unlike the Student t-test, the Welch test does not assume the variances of the two categories to be equal in the population. Ruxten (2006) even argues that the Welch t-test should always be prefered over the Student t-test.
However, unlike a trimmed means test, it does require the assumption of normality.
The test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/WelchIS.html)
Parameters
----------
catField : dataframe or list
the categorical data
scaleField : dataframe or list
the scores
categories : list, optional
to indicate which two categories of catField to use, otherwise first two found will be used.
dmu : float, optional
difference according to null hypothesis (default is 0)
df_ver : {'ws', 'w', welch-satterthwaite, 'welch'}, optional
version of degrees of freedom to use
Returns
-------
A dataframe with:
* *n cat. 1*, the sample size of the first category
* *n cat. 2*, the sample size of the second category
* *mean cat. 1*, the sample mean of the first category
* *mean cat. 2*, the sample mean of the second category
* *diff.*, difference between the two sample means
* *hyp. diff.*, hypothesized difference between the two population means
* *statistic*, the test statistic (t-value)
* *df*, the degrees of freedom
* *pValue*, the significance (p-value)
* *test*, name of test used
Notes
-----
The formula used is:
$$t = \\frac{\\bar{x}_1 - \\bar{x}_2}{SE}$$
$$df = \\frac{SE^4}{\\frac{\\left(s_1^2\\right)^2}{n_1^2\\times\\left(n_1 - 1\\right)} + \\frac{\\left(s_2^2\\right)^2}{n_2^2\\times\\left(n_2 - 1\\right)}}$$
$$sig. = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$
With:
$$SE = \\sqrt{\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2}}$$
$$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$
$$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
*Symbols used:*
* \\(x_{i,j}\\), the j-th score in category i
* \\(n_i\\), the number of scores in category i
* \\(T\\left(\\dots\\right)\\), the cumulative distribution function of the t-distribution
The degrees of freedom can be found in Welch (1938, p. 353, eq. 9; 1947, p. 32, eq. 28) and Satterthwaite (1946, p. 114, eq. 17). Welch (1947, p. 32, eq. 29) also suggests another formula for the degrees of freedom, which can be written as:
$$df_w = \\frac{\\left( \\frac{s_1^2}{n_1}+\\frac{s_2^2}{n_2}\\right)^2}{\\frac{s_1^2}{n_1^2 \\times \\left(n_1 + 1\\right)} + \\frac{s_2^2}{n_2^2 \\times \\left(n_2 + 1\\right)}} - 2$$
Later, Aspin and Welch (1949, p. 295) indicate this alternative version has little to know advantage over the other version.
Before, After and Alternatives
------------------------------
Before this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation
Or a visualisation are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
After the test you might want an effect size measure, options include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html), [biserial correlation](../correlations/cor_biserial.html), [point-biserial correlation](../effect_sizes/cor_point_biserial.html)
There are four similar tests, with different assumptions.
|test|equal variance|normality|
|-------|-----------|---------|
|[Student t](../tests/test_student_t_is.html)| yes | yes|
|[Welch t](../tests/test_welch_t_is.html) | no | yes|
|[Trimmed means](../tests/test_trimmed_mean_is.html) | yes | no |
|[Yuen-Welch](../tests/test_trimmed_mean_is.html)|no | no |
Another test that in some cases could be used is the [Z test](../tests/test_z_is.html)
References
----------
Aspin, A. A., & Welch, B. L. (1949). Tables for use in comparisons whose accuracy involves two variances, separately estimated. *Biometrika, 36*(3/4), 290. https://doi.org/10.2307/2332668
Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test. *Behavioral Ecology, 17*(4), 688–690. https://doi.org/10.1093/beheco/ark016
Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. *Biometrics Bulletin, 2*(6), 110. https://doi.org/10.2307/3002019
Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. *Biometrika, 29*(3–4), 350–362. https://doi.org/10.1093/biomet/29.3-4.350
Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. *Biometrika, 34*(1/2), 28–35. https://doi.org/10.2307/2332510
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Example 1: Dataframe
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['age']
>>> ex1 = ex1.replace("89 OR OLDER", "90")
>>> ts_welch_t_is(df1['sex'], ex1)
n FEMALE n MALE mean FEMALE mean MALE diff. hyp. diff. statistic df p-value test
0 1083 886 48.561404 47.760722 0.800681 0 0.998958 1894.978467 0.317942 Welch independent samples t-test
Example 2: List
>>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
>>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
>>> ts_welch_t_is(groups, scores)
n int. n nat. mean int. mean nat. diff. hyp. diff. statistic df p-value test
0 12 6 61.916667 41.666667 20.25 0 1.69314 9.750994 0.122075 Welch independent samples t-test
'''
#convert to pandas series if needed
if type(catField) is list:
catField = pd.Series(catField)
if type(scaleField) is list:
scaleField = pd.Series(scaleField)
#combine as one dataframe
df = pd.concat([catField, scaleField], axis=1)
df = df.dropna()
#the two categories
if categories is not None:
cat1 = categories[0]
cat2 = categories[1]
else:
cat1 = df.iloc[:,0].value_counts().index[0]
cat2 = df.iloc[:,0].value_counts().index[1]
#seperate the scores for each category
x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
#make sure they are floats
x1 = [float(x) for x in x1]
x2 = [float(x) for x in x2]
n1 = len(x1)
n2 = len(x2)
n = n1 + n2
m1 = mean(x1)
m2 = mean(x2)
var1 = variance(x1)
var2 = variance(x2)
sse = var1/n1 + var2/n2
se = (sse)**0.5
tValue = (m1 - m2 - dmu)/se
if df_ver=='ws' or df_ver=='welch-satterwaithe':
df = sse**2/(var1**2/(n1**2*(n1 - 1)) + var2**2/(n2**2*(n2 - 1)))
elif df_ver=='w' or df_ver=='welch':
df = sse**2/(var1**2/(n1**2*(n1 + 1)) + var2**2/(n2**2*(n2 + 1))) - 2
pValue = 2*(1-t.cdf(abs(tValue), df))
statistic = tValue
testUsed = "Welch independent samples t-test"
colnames = ["n "+cat1, "n "+cat2, "mean "+cat1, "mean "+cat2, "diff.", "hyp. diff.", "statistic", "df", "p-value", "test"]
results = pd.DataFrame([[n1, n2, m1, m2, m1 - m2, dmu, statistic, df, pValue, testUsed]], columns=colnames)
return(results)
Functions
def ts_welch_t_is(catField, scaleField, categories=None, dmu=0, df_ver='ws')
-
Welch t Test (Independent Samples)
A test to compare two means. The null hypothesis would be that the means of each category are equal in the population.
Unlike the Student t-test, the Welch test does not assume the variances of the two categories to be equal in the population. Ruxten (2006) even argues that the Welch t-test should always be prefered over the Student t-test.
However, unlike a trimmed means test, it does require the assumption of normality.
The test is also described at PeterStatistics.com
Parameters
catField
:dataframe
orlist
- the categorical data
scaleField
:dataframe
orlist
- the scores
categories
:list
, optional- to indicate which two categories of catField to use, otherwise first two found will be used.
dmu
:float
, optional- difference according to null hypothesis (default is 0)
df_ver
:{'ws', 'w', welch-satterthwaite, 'welch'}
, optional- version of degrees of freedom to use
Returns
A dataframe with:
- n cat. 1, the sample size of the first category
- n cat. 2, the sample size of the second category
- mean cat. 1, the sample mean of the first category
- mean cat. 2, the sample mean of the second category
- diff., difference between the two sample means
- hyp. diff., hypothesized difference between the two population means
- statistic, the test statistic (t-value)
- df, the degrees of freedom
- pValue, the significance (p-value)
- test, name of test used
Notes
The formula used is: t = \frac{\bar{x}_1 - \bar{x}_2}{SE} df = \frac{SE^4}{\frac{\left(s_1^2\right)^2}{n_1^2\times\left(n_1 - 1\right)} + \frac{\left(s_2^2\right)^2}{n_2^2\times\left(n_2 - 1\right)}} sig. = 2\times\left(1 - T\left(\left|t\right|, df\right)\right)
With: SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} s_i^2 = \frac{\sum_{j=1}^{n_i} \left(x_{i,j} - \bar{x}_i\right)^2}{n_i - 1} \bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}
Symbols used:
- x_{i,j}, the j-th score in category i
- n_i, the number of scores in category i
- T\left(\dots\right), the cumulative distribution function of the t-distribution
The degrees of freedom can be found in Welch (1938, p. 353, eq. 9; 1947, p. 32, eq. 28) and Satterthwaite (1946, p. 114, eq. 17). Welch (1947, p. 32, eq. 29) also suggests another formula for the degrees of freedom, which can be written as: df_w = \frac{\left( \frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{s_1^2}{n_1^2 \times \left(n_1 + 1\right)} + \frac{s_2^2}{n_2^2 \times \left(n_2 + 1\right)}} - 2
Later, Aspin and Welch (1949, p. 295) indicate this alternative version has little to know advantage over the other version.
Before, After and Alternatives
Before this you might want some descriptive measures. Use me_mode_bin for Mode for Binned Data, me_mean for different types of mean, and/or me_variation for different Measures of Quantitative Variation
Or a visualisation are vi_boxplot_single for a Box (and Whisker) Plot and vi_histogram for a Histogram
After the test you might want an effect size measure, options include: Common Language, Cohen d_s, Cohen U, Hedges g, Glass delta, biserial correlation, point-biserial correlation
There are four similar tests, with different assumptions.
test equal variance normality Student t yes yes Welch t no yes Trimmed means yes no Yuen-Welch no no Another test that in some cases could be used is the Z test
References
Aspin, A. A., & Welch, B. L. (1949). Tables for use in comparisons whose accuracy involves two variances, separately estimated. Biometrika, 36(3/4), 290. https://doi.org/10.2307/2332668
Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test. Behavioral Ecology, 17(4), 688–690. https://doi.org/10.1093/beheco/ark016
Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2(6), 110. https://doi.org/10.2307/3002019
Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29(3–4), 350–362. https://doi.org/10.1093/biomet/29.3-4.350
Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. Biometrika, 34(1/2), 28–35. https://doi.org/10.2307/2332510
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Example 1: Dataframe
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['age'] >>> ex1 = ex1.replace("89 OR OLDER", "90") >>> ts_welch_t_is(df1['sex'], ex1) n FEMALE n MALE mean FEMALE mean MALE diff. hyp. diff. statistic df p-value test 0 1083 886 48.561404 47.760722 0.800681 0 0.998958 1894.978467 0.317942 Welch independent samples t-test
Example 2: List
>>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40] >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."] >>> ts_welch_t_is(groups, scores) n int. n nat. mean int. mean nat. diff. hyp. diff. statistic df p-value test 0 12 6 61.916667 41.666667 20.25 0 1.69314 9.750994 0.122075 Welch independent samples t-test
Expand source code
def ts_welch_t_is(catField, scaleField, categories=None, dmu=0, df_ver='ws'): ''' Welch t Test (Independent Samples) ---------------------------------- A test to compare two means. The null hypothesis would be that the means of each category are equal in the population. Unlike the Student t-test, the Welch test does not assume the variances of the two categories to be equal in the population. Ruxten (2006) even argues that the Welch t-test should always be prefered over the Student t-test. However, unlike a trimmed means test, it does require the assumption of normality. The test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/WelchIS.html) Parameters ---------- catField : dataframe or list the categorical data scaleField : dataframe or list the scores categories : list, optional to indicate which two categories of catField to use, otherwise first two found will be used. dmu : float, optional difference according to null hypothesis (default is 0) df_ver : {'ws', 'w', welch-satterthwaite, 'welch'}, optional version of degrees of freedom to use Returns ------- A dataframe with: * *n cat. 1*, the sample size of the first category * *n cat. 2*, the sample size of the second category * *mean cat. 1*, the sample mean of the first category * *mean cat. 2*, the sample mean of the second category * *diff.*, difference between the two sample means * *hyp. diff.*, hypothesized difference between the two population means * *statistic*, the test statistic (t-value) * *df*, the degrees of freedom * *pValue*, the significance (p-value) * *test*, name of test used Notes ----- The formula used is: $$t = \\frac{\\bar{x}_1 - \\bar{x}_2}{SE}$$ $$df = \\frac{SE^4}{\\frac{\\left(s_1^2\\right)^2}{n_1^2\\times\\left(n_1 - 1\\right)} + \\frac{\\left(s_2^2\\right)^2}{n_2^2\\times\\left(n_2 - 1\\right)}}$$ $$sig. = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$ With: $$SE = \\sqrt{\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2}}$$ $$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$ $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$ *Symbols used:* * \\(x_{i,j}\\), the j-th score in category i * \\(n_i\\), the number of scores in category i * \\(T\\left(\\dots\\right)\\), the cumulative distribution function of the t-distribution The degrees of freedom can be found in Welch (1938, p. 353, eq. 9; 1947, p. 32, eq. 28) and Satterthwaite (1946, p. 114, eq. 17). Welch (1947, p. 32, eq. 29) also suggests another formula for the degrees of freedom, which can be written as: $$df_w = \\frac{\\left( \\frac{s_1^2}{n_1}+\\frac{s_2^2}{n_2}\\right)^2}{\\frac{s_1^2}{n_1^2 \\times \\left(n_1 + 1\\right)} + \\frac{s_2^2}{n_2^2 \\times \\left(n_2 + 1\\right)}} - 2$$ Later, Aspin and Welch (1949, p. 295) indicate this alternative version has little to know advantage over the other version. Before, After and Alternatives ------------------------------ Before this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation Or a visualisation are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram After the test you might want an effect size measure, options include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html), [biserial correlation](../correlations/cor_biserial.html), [point-biserial correlation](../effect_sizes/cor_point_biserial.html) There are four similar tests, with different assumptions. |test|equal variance|normality| |-------|-----------|---------| |[Student t](../tests/test_student_t_is.html)| yes | yes| |[Welch t](../tests/test_welch_t_is.html) | no | yes| |[Trimmed means](../tests/test_trimmed_mean_is.html) | yes | no | |[Yuen-Welch](../tests/test_trimmed_mean_is.html)|no | no | Another test that in some cases could be used is the [Z test](../tests/test_z_is.html) References ---------- Aspin, A. A., & Welch, B. L. (1949). Tables for use in comparisons whose accuracy involves two variances, separately estimated. *Biometrika, 36*(3/4), 290. https://doi.org/10.2307/2332668 Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test. *Behavioral Ecology, 17*(4), 688–690. https://doi.org/10.1093/beheco/ark016 Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. *Biometrics Bulletin, 2*(6), 110. https://doi.org/10.2307/3002019 Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. *Biometrika, 29*(3–4), 350–362. https://doi.org/10.1093/biomet/29.3-4.350 Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. *Biometrika, 34*(1/2), 28–35. https://doi.org/10.2307/2332510 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Example 1: Dataframe >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['age'] >>> ex1 = ex1.replace("89 OR OLDER", "90") >>> ts_welch_t_is(df1['sex'], ex1) n FEMALE n MALE mean FEMALE mean MALE diff. hyp. diff. statistic df p-value test 0 1083 886 48.561404 47.760722 0.800681 0 0.998958 1894.978467 0.317942 Welch independent samples t-test Example 2: List >>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40] >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."] >>> ts_welch_t_is(groups, scores) n int. n nat. mean int. mean nat. diff. hyp. diff. statistic df p-value test 0 12 6 61.916667 41.666667 20.25 0 1.69314 9.750994 0.122075 Welch independent samples t-test ''' #convert to pandas series if needed if type(catField) is list: catField = pd.Series(catField) if type(scaleField) is list: scaleField = pd.Series(scaleField) #combine as one dataframe df = pd.concat([catField, scaleField], axis=1) df = df.dropna() #the two categories if categories is not None: cat1 = categories[0] cat2 = categories[1] else: cat1 = df.iloc[:,0].value_counts().index[0] cat2 = df.iloc[:,0].value_counts().index[1] #seperate the scores for each category x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1]) x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2]) #make sure they are floats x1 = [float(x) for x in x1] x2 = [float(x) for x in x2] n1 = len(x1) n2 = len(x2) n = n1 + n2 m1 = mean(x1) m2 = mean(x2) var1 = variance(x1) var2 = variance(x2) sse = var1/n1 + var2/n2 se = (sse)**0.5 tValue = (m1 - m2 - dmu)/se if df_ver=='ws' or df_ver=='welch-satterwaithe': df = sse**2/(var1**2/(n1**2*(n1 - 1)) + var2**2/(n2**2*(n2 - 1))) elif df_ver=='w' or df_ver=='welch': df = sse**2/(var1**2/(n1**2*(n1 + 1)) + var2**2/(n2**2*(n2 + 1))) - 2 pValue = 2*(1-t.cdf(abs(tValue), df)) statistic = tValue testUsed = "Welch independent samples t-test" colnames = ["n "+cat1, "n "+cat2, "mean "+cat1, "mean "+cat2, "diff.", "hyp. diff.", "statistic", "df", "p-value", "test"] results = pd.DataFrame([[n1, n2, m1, m2, m1 - m2, dmu, statistic, df, pValue, testUsed]], columns=colnames) return(results)