Module stikpetP.tests.test_welch_t_is

Expand source code
from statistics import mean, variance
from scipy.stats import t 
import pandas as pd

def ts_welch_t_is(catField, scaleField, categories=None, dmu=0, df_ver='ws'):
    '''
    Welch t Test (Independent Samples)
    ----------------------------------
    A test to compare two means. The null hypothesis would be that the means of each category are equal in the population.
    
    Unlike the Student t-test, the Welch test does not assume the variances of the two categories to be equal in the population. Ruxten (2006) even argues that the Welch t-test should always be prefered over the Student t-test.

    However, unlike a trimmed means test, it does require the assumption of normality.

    The test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/WelchIS.html)
    
    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scaleField : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    df_ver : {'ws', 'w', welch-satterthwaite, 'welch'}, optional
        version of degrees of freedom to use
        
    Returns
    -------
    A dataframe with:
    
    * *n cat. 1*, the sample size of the first category
    * *n cat. 2*, the sample size of the second category
    * *mean cat. 1*, the sample mean of the first category
    * *mean cat. 2*, the sample mean of the second category
    * *diff.*, difference between the two sample means
    * *hyp. diff.*, hypothesized difference between the two population means
    * *statistic*, the test statistic (t-value)
    * *df*, the degrees of freedom
    * *pValue*, the significance (p-value)
    * *test*, name of test used
    
    Notes
    -----
    The formula used is:
    $$t = \\frac{\\bar{x}_1 - \\bar{x}_2}{SE}$$
    $$df = \\frac{SE^4}{\\frac{\\left(s_1^2\\right)^2}{n_1^2\\times\\left(n_1 - 1\\right)} + \\frac{\\left(s_2^2\\right)^2}{n_2^2\\times\\left(n_2 - 1\\right)}}$$
    $$sig. = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$
    
    With:
    $$SE = \\sqrt{\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2}}$$
    $$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$
    $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\), the j-th score in category i
    * \\(n_i\\), the number of scores in category i
    * \\(T\\left(\\dots\\right)\\), the cumulative distribution function of the t-distribution

    The degrees of freedom can be found in Welch (1938, p. 353, eq. 9; 1947, p. 32, eq. 28) and Satterthwaite (1946, p. 114, eq. 17). Welch (1947, p. 32, eq. 29) also suggests another formula for the degrees of freedom, which can be written as:
    $$df_w = \\frac{\\left( \\frac{s_1^2}{n_1}+\\frac{s_2^2}{n_2}\\right)^2}{\\frac{s_1^2}{n_1^2 \\times \\left(n_1 + 1\\right)} + \\frac{s_2^2}{n_2^2 \\times \\left(n_2 + 1\\right)}} - 2$$

    Later, Aspin and Welch (1949, p. 295) indicate this alternative version has little to know advantage over the other version.

    Before, After and Alternatives
    ------------------------------
    Before this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation
    
    Or a visualisation are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
    
    After the test you might want an effect size measure, options include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html), [biserial correlation](../correlations/cor_biserial.html), [point-biserial correlation](../effect_sizes/cor_point_biserial.html)
    
    There are four similar tests, with different assumptions. 
    
    |test|equal variance|normality|
    |-------|-----------|---------|
    |[Student t](../tests/test_student_t_is.html)| yes | yes|
    |[Welch t](../tests/test_welch_t_is.html) | no | yes|
    |[Trimmed means](../tests/test_trimmed_mean_is.html) | yes | no | 
    |[Yuen-Welch](../tests/test_trimmed_mean_is.html)|no | no |

    Another test that in some cases could be used is the [Z test](../tests/test_z_is.html)
    
    References
    ----------
    Aspin, A. A., & Welch, B. L. (1949). Tables for use in comparisons whose accuracy involves two variances, separately estimated. *Biometrika, 36*(3/4), 290. https://doi.org/10.2307/2332668
    
    Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test. *Behavioral Ecology, 17*(4), 688–690. https://doi.org/10.1093/beheco/ark016

    Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. *Biometrics Bulletin, 2*(6), 110. https://doi.org/10.2307/3002019
    
    Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. *Biometrika, 29*(3–4), 350–362. https://doi.org/10.1093/biomet/29.3-4.350

    Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. *Biometrika, 34*(1/2), 28–35. https://doi.org/10.2307/2332510
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Dataframe
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['age']
    >>> ex1 = ex1.replace("89 OR OLDER", "90")
    >>> ts_welch_t_is(df1['sex'], ex1)
       n FEMALE  n MALE  mean FEMALE  mean MALE     diff.  hyp. diff.  statistic           df   p-value                              test
    0      1083     886    48.561404  47.760722  0.800681           0   0.998958  1894.978467  0.317942  Welch independent samples t-test
    
    Example 2: List
    >>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
    >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
    >>> ts_welch_t_is(groups, scores)
       n int.  n nat.  mean int.  mean nat.  diff.  hyp. diff.  statistic        df   p-value                              test
    0      12       6  61.916667  41.666667  20.25           0    1.69314  9.750994  0.122075  Welch independent samples t-test
    
    '''
    
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scaleField) is list:
        scaleField = pd.Series(scaleField)
    
    #combine as one dataframe
    df = pd.concat([catField, scaleField], axis=1)
    df = df.dropna()
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    m1 = mean(x1)
    m2 = mean(x2)
    
    var1 = variance(x1)
    var2 = variance(x2)
    
    sse = var1/n1 + var2/n2
    se = (sse)**0.5
    
    tValue = (m1 - m2 - dmu)/se
    if df_ver=='ws' or df_ver=='welch-satterwaithe':
        df = sse**2/(var1**2/(n1**2*(n1 - 1)) + var2**2/(n2**2*(n2 - 1)))
    elif df_ver=='w' or df_ver=='welch':
        df = sse**2/(var1**2/(n1**2*(n1 + 1)) + var2**2/(n2**2*(n2 + 1))) - 2
        
    pValue = 2*(1-t.cdf(abs(tValue), df))
    
    statistic = tValue
    testUsed = "Welch independent samples t-test"
    
    colnames = ["n "+cat1, "n "+cat2, "mean "+cat1, "mean "+cat2, "diff.", "hyp. diff.", "statistic", "df", "p-value", "test"]
    results = pd.DataFrame([[n1, n2, m1, m2, m1 - m2, dmu, statistic, df, pValue, testUsed]], columns=colnames)
    
    return(results)

Functions

def ts_welch_t_is(catField, scaleField, categories=None, dmu=0, df_ver='ws')

Welch t Test (Independent Samples)

A test to compare two means. The null hypothesis would be that the means of each category are equal in the population.

Unlike the Student t-test, the Welch test does not assume the variances of the two categories to be equal in the population. Ruxten (2006) even argues that the Welch t-test should always be prefered over the Student t-test.

However, unlike a trimmed means test, it does require the assumption of normality.

The test is also described at PeterStatistics.com

Parameters

catField : dataframe or list
the categorical data
scaleField : dataframe or list
the scores
categories : list, optional
to indicate which two categories of catField to use, otherwise first two found will be used.
dmu : float, optional
difference according to null hypothesis (default is 0)
df_ver : {'ws', 'w', welch-satterthwaite, 'welch'}, optional
version of degrees of freedom to use

Returns

A dataframe with:
 
  • n cat. 1, the sample size of the first category
  • n cat. 2, the sample size of the second category
  • mean cat. 1, the sample mean of the first category
  • mean cat. 2, the sample mean of the second category
  • diff., difference between the two sample means
  • hyp. diff., hypothesized difference between the two population means
  • statistic, the test statistic (t-value)
  • df, the degrees of freedom
  • pValue, the significance (p-value)
  • test, name of test used

Notes

The formula used is: t = \frac{\bar{x}_1 - \bar{x}_2}{SE} df = \frac{SE^4}{\frac{\left(s_1^2\right)^2}{n_1^2\times\left(n_1 - 1\right)} + \frac{\left(s_2^2\right)^2}{n_2^2\times\left(n_2 - 1\right)}} sig. = 2\times\left(1 - T\left(\left|t\right|, df\right)\right)

With: SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} s_i^2 = \frac{\sum_{j=1}^{n_i} \left(x_{i,j} - \bar{x}_i\right)^2}{n_i - 1} \bar{x}_i = \frac{\sum_{j=1}^{n_i} x_{i,j}}{n_i}

Symbols used:

  • x_{i,j}, the j-th score in category i
  • n_i, the number of scores in category i
  • T\left(\dots\right), the cumulative distribution function of the t-distribution

The degrees of freedom can be found in Welch (1938, p. 353, eq. 9; 1947, p. 32, eq. 28) and Satterthwaite (1946, p. 114, eq. 17). Welch (1947, p. 32, eq. 29) also suggests another formula for the degrees of freedom, which can be written as: df_w = \frac{\left( \frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{s_1^2}{n_1^2 \times \left(n_1 + 1\right)} + \frac{s_2^2}{n_2^2 \times \left(n_2 + 1\right)}} - 2

Later, Aspin and Welch (1949, p. 295) indicate this alternative version has little to know advantage over the other version.

Before, After and Alternatives

Before this you might want some descriptive measures. Use me_mode_bin for Mode for Binned Data, me_mean for different types of mean, and/or me_variation for different Measures of Quantitative Variation

Or a visualisation are vi_boxplot_single for a Box (and Whisker) Plot and vi_histogram for a Histogram

After the test you might want an effect size measure, options include: Common Language, Cohen d_s, Cohen U, Hedges g, Glass delta, biserial correlation, point-biserial correlation

There are four similar tests, with different assumptions.

test equal variance normality
Student t yes yes
Welch t no yes
Trimmed means yes no
Yuen-Welch no no

Another test that in some cases could be used is the Z test

References

Aspin, A. A., & Welch, B. L. (1949). Tables for use in comparisons whose accuracy involves two variances, separately estimated. Biometrika, 36(3/4), 290. https://doi.org/10.2307/2332668

Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test. Behavioral Ecology, 17(4), 688–690. https://doi.org/10.1093/beheco/ark016

Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2(6), 110. https://doi.org/10.2307/3002019

Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29(3–4), 350–362. https://doi.org/10.1093/biomet/29.3-4.350

Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. Biometrika, 34(1/2), 28–35. https://doi.org/10.2307/2332510

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Dataframe

>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['age']
>>> ex1 = ex1.replace("89 OR OLDER", "90")
>>> ts_welch_t_is(df1['sex'], ex1)
   n FEMALE  n MALE  mean FEMALE  mean MALE     diff.  hyp. diff.  statistic           df   p-value                              test
0      1083     886    48.561404  47.760722  0.800681           0   0.998958  1894.978467  0.317942  Welch independent samples t-test

Example 2: List

>>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
>>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
>>> ts_welch_t_is(groups, scores)
   n int.  n nat.  mean int.  mean nat.  diff.  hyp. diff.  statistic        df   p-value                              test
0      12       6  61.916667  41.666667  20.25           0    1.69314  9.750994  0.122075  Welch independent samples t-test
Expand source code
def ts_welch_t_is(catField, scaleField, categories=None, dmu=0, df_ver='ws'):
    '''
    Welch t Test (Independent Samples)
    ----------------------------------
    A test to compare two means. The null hypothesis would be that the means of each category are equal in the population.
    
    Unlike the Student t-test, the Welch test does not assume the variances of the two categories to be equal in the population. Ruxten (2006) even argues that the Welch t-test should always be prefered over the Student t-test.

    However, unlike a trimmed means test, it does require the assumption of normality.

    The test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/WelchIS.html)
    
    Parameters
    ----------
    catField : dataframe or list 
        the categorical data
    scaleField : dataframe or list
        the scores
    categories : list, optional 
        to indicate which two categories of catField to use, otherwise first two found will be used.
    dmu : float, optional 
        difference according to null hypothesis (default is 0)
    df_ver : {'ws', 'w', welch-satterthwaite, 'welch'}, optional
        version of degrees of freedom to use
        
    Returns
    -------
    A dataframe with:
    
    * *n cat. 1*, the sample size of the first category
    * *n cat. 2*, the sample size of the second category
    * *mean cat. 1*, the sample mean of the first category
    * *mean cat. 2*, the sample mean of the second category
    * *diff.*, difference between the two sample means
    * *hyp. diff.*, hypothesized difference between the two population means
    * *statistic*, the test statistic (t-value)
    * *df*, the degrees of freedom
    * *pValue*, the significance (p-value)
    * *test*, name of test used
    
    Notes
    -----
    The formula used is:
    $$t = \\frac{\\bar{x}_1 - \\bar{x}_2}{SE}$$
    $$df = \\frac{SE^4}{\\frac{\\left(s_1^2\\right)^2}{n_1^2\\times\\left(n_1 - 1\\right)} + \\frac{\\left(s_2^2\\right)^2}{n_2^2\\times\\left(n_2 - 1\\right)}}$$
    $$sig. = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$
    
    With:
    $$SE = \\sqrt{\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2}}$$
    $$s_i^2 = \\frac{\\sum_{j=1}^{n_i} \\left(x_{i,j} - \\bar{x}_i\\right)^2}{n_i - 1}$$
    $$\\bar{x}_i = \\frac{\\sum_{j=1}^{n_i} x_{i,j}}{n_i}$$
    
    *Symbols used:*
    
    * \\(x_{i,j}\\), the j-th score in category i
    * \\(n_i\\), the number of scores in category i
    * \\(T\\left(\\dots\\right)\\), the cumulative distribution function of the t-distribution

    The degrees of freedom can be found in Welch (1938, p. 353, eq. 9; 1947, p. 32, eq. 28) and Satterthwaite (1946, p. 114, eq. 17). Welch (1947, p. 32, eq. 29) also suggests another formula for the degrees of freedom, which can be written as:
    $$df_w = \\frac{\\left( \\frac{s_1^2}{n_1}+\\frac{s_2^2}{n_2}\\right)^2}{\\frac{s_1^2}{n_1^2 \\times \\left(n_1 + 1\\right)} + \\frac{s_2^2}{n_2^2 \\times \\left(n_2 + 1\\right)}} - 2$$

    Later, Aspin and Welch (1949, p. 295) indicate this alternative version has little to know advantage over the other version.

    Before, After and Alternatives
    ------------------------------
    Before this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation
    
    Or a visualisation are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
    
    After the test you might want an effect size measure, options include: [Common Language](../effect_sizes/eff_size_common_language_is.html), [Cohen d_s](../effect_sizes/eff_size_hedges_g_is.html), [Cohen U](../effect_sizes/eff_size_cohen_u.html), [Hedges g](../effect_sizes/eff_size_hedges_g_is.html), [Glass delta](../effect_sizes/eff_size_glass_delta.html), [biserial correlation](../correlations/cor_biserial.html), [point-biserial correlation](../effect_sizes/cor_point_biserial.html)
    
    There are four similar tests, with different assumptions. 
    
    |test|equal variance|normality|
    |-------|-----------|---------|
    |[Student t](../tests/test_student_t_is.html)| yes | yes|
    |[Welch t](../tests/test_welch_t_is.html) | no | yes|
    |[Trimmed means](../tests/test_trimmed_mean_is.html) | yes | no | 
    |[Yuen-Welch](../tests/test_trimmed_mean_is.html)|no | no |

    Another test that in some cases could be used is the [Z test](../tests/test_z_is.html)
    
    References
    ----------
    Aspin, A. A., & Welch, B. L. (1949). Tables for use in comparisons whose accuracy involves two variances, separately estimated. *Biometrika, 36*(3/4), 290. https://doi.org/10.2307/2332668
    
    Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student’s t-test and the Mann–Whitney U test. *Behavioral Ecology, 17*(4), 688–690. https://doi.org/10.1093/beheco/ark016

    Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. *Biometrics Bulletin, 2*(6), 110. https://doi.org/10.2307/3002019
    
    Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. *Biometrika, 29*(3–4), 350–362. https://doi.org/10.1093/biomet/29.3-4.350

    Welch, B. L. (1947). The generalization of `Student’s’ problem when several different population variances are involved. *Biometrika, 34*(1/2), 28–35. https://doi.org/10.2307/2332510
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Dataframe
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['age']
    >>> ex1 = ex1.replace("89 OR OLDER", "90")
    >>> ts_welch_t_is(df1['sex'], ex1)
       n FEMALE  n MALE  mean FEMALE  mean MALE     diff.  hyp. diff.  statistic           df   p-value                              test
    0      1083     886    48.561404  47.760722  0.800681           0   0.998958  1894.978467  0.317942  Welch independent samples t-test
    
    Example 2: List
    >>> scores = [20,50,80,15,40,85,30,45,70,60, None, 90,25,40,70,65, None, 70,98,40]
    >>> groups = ["nat.","int.","int.","nat.","int.", "int.","nat.","nat.","int.","int.","int.","int.","int.","int.","nat.", "int." ,None,"nat.","int.","int."]
    >>> ts_welch_t_is(groups, scores)
       n int.  n nat.  mean int.  mean nat.  diff.  hyp. diff.  statistic        df   p-value                              test
    0      12       6  61.916667  41.666667  20.25           0    1.69314  9.750994  0.122075  Welch independent samples t-test
    
    '''
    
    #convert to pandas series if needed
    if type(catField) is list:
        catField = pd.Series(catField)
    
    if type(scaleField) is list:
        scaleField = pd.Series(scaleField)
    
    #combine as one dataframe
    df = pd.concat([catField, scaleField], axis=1)
    df = df.dropna()
    
    #the two categories
    if categories is not None:
        cat1 = categories[0]
        cat2 = categories[1]
    else:
        cat1 = df.iloc[:,0].value_counts().index[0]
        cat2 = df.iloc[:,0].value_counts().index[1]
    
    #seperate the scores for each category
    x1 = list(df.iloc[:,1][df.iloc[:,0] == cat1])
    x2 = list(df.iloc[:,1][df.iloc[:,0] == cat2])
    
    #make sure they are floats
    x1 = [float(x) for x in x1]
    x2 = [float(x) for x in x2]
    
    n1 = len(x1)
    n2 = len(x2)
    n = n1 + n2
    
    m1 = mean(x1)
    m2 = mean(x2)
    
    var1 = variance(x1)
    var2 = variance(x2)
    
    sse = var1/n1 + var2/n2
    se = (sse)**0.5
    
    tValue = (m1 - m2 - dmu)/se
    if df_ver=='ws' or df_ver=='welch-satterwaithe':
        df = sse**2/(var1**2/(n1**2*(n1 - 1)) + var2**2/(n2**2*(n2 - 1)))
    elif df_ver=='w' or df_ver=='welch':
        df = sse**2/(var1**2/(n1**2*(n1 + 1)) + var2**2/(n2**2*(n2 + 1))) - 2
        
    pValue = 2*(1-t.cdf(abs(tValue), df))
    
    statistic = tValue
    testUsed = "Welch independent samples t-test"
    
    colnames = ["n "+cat1, "n "+cat2, "mean "+cat1, "mean "+cat2, "diff.", "hyp. diff.", "statistic", "df", "p-value", "test"]
    results = pd.DataFrame([[n1, n2, m1, m2, m1 - m2, dmu, statistic, df, pValue, testUsed]], columns=colnames)
    
    return(results)