Module `stikpetP.tests.test_student_t_ps`

Expand source code

import pandas as pd
from scipy.stats import t 

def ts_student_t_ps(field1, field2, dmu=0):
    '''
    Student t Test (Paired Samples)
    -------------------------------
    The assumption about the population (null hypothesis) for this test is a pre-defined difference between two means, usually zero (i.e. the difference between the (arithmetic) means is zero, they are the same in the population). If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.
    
    Parameters
    ----------
    field1 : pandas series
        the ordinal or scale scores of the first variable
    field2 : pandas series
        the ordinal or scale scores of the second variable
    dmu : float, optional 
        hypothesized difference. Default is zero
    
    Returns
    -------
    res : dataframe with 
    
    * *n*, the number of scores
    * *statistic*, the test statistic (t-value)
    * *df*, the degrees of freedom
    * *p-value*, significance (p-value)
    
    Notes
    -----
    The formula used is:
    $$t = \\frac{\\bar{d} - d_{H0}}{SE}$$
    $$df = n - 1$$
    $$sig. = 2\\times\\left(1 - T\\left(\\left|z\\right|, df\\right)\\right)$$
    
    With:
    $$\\bar{d} = \\bar{x}_1 - \\bar{x}_2 = \\frac{\\sum_{i=1}^n d_i}{n}$$
    $$SE = \\sqrt{\\frac{s_d^2}{n}}$$
    $$s_d^2 = \\frac{\\sum_{i=1}^n \\left(d_i -\\bar{d}\\right)^2}{n-1}$$
    $$d_i = x_{i,1} - x_{i,2}$$
    $$\\bar{d}=\\frac{\\sum_{i=1}^n d_i}{n}$$
    
    *Symbols used*
    
    * \\(x_{i,1}\\), is the i-th score from the first variable
    * \\(x_{i,2}\\), is the i-th score from the second variable
    * \\(d_{H0}\\), difference according to null hypothesis (dmu parameter)
    * \\(\\T\\left(\\dots\\right)\\), cumulative density function of the Student t distribution.
    
    References
    ----------
    Student. (1908). The probable error of a mean. *Biometrika, 6*(1), 1–25. doi:10.1093/biomet/6.1.1
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    
    '''
    if type(field1) == list:
        field1 = pd.Series(field1)
        
    if type(field2) == list:
        field2 = pd.Series(field2)
    
    data = pd.concat([field1, field2], axis=1)
    data.columns = ["field1", "field2"]
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n
    n = len(data["field1"])
    
    data["diffs"] = data["field1"] - data["field2"]
    dsigma = data["diffs"].std()
    dm = data["diffs"].mean()
    se = dsigma/n**0.5
    tval = (dm - dmu)/se
    df = n - 1
    pvalue = 2 * (1 - t.cdf(abs(tval), df))
    
    res = pd.DataFrame([[n, tval, df, pvalue]])
    res.columns = ["n", "statistic", "df", "p-value"]
    
    return res

Functions

def ts_student_t_ps(field1, field2, dmu=0)

Student t Test (Paired Samples)

The assumption about the population (null hypothesis) for this test is a pre-defined difference between two means, usually zero (i.e. the difference between the (arithmetic) means is zero, they are the same in the population). If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.

Parameters

field1 : pandas series: the ordinal or scale scores of the first variable
field2 : pandas series: the ordinal or scale scores of the second variable
dmu : float, optional: hypothesized difference. Default is zero

Returns

res : dataframe with

n, the number of scores
statistic, the test statistic (t-value)
df, the degrees of freedom
p-value, significance (p-value)

Notes

The formula used is: $t = \frac{\bar{d} - d_{H0}}{SE}$ $df = n - 1$ $sig. = 2\times\left(1 - T\left(\left|z\right|, df\right)\right)$

With: $\bar{d} = \bar{x}_1 - \bar{x}_2 = \frac{\sum_{i=1}^n d_i}{n}$ $SE = \sqrt{\frac{s_d^2}{n}}$ $s_d^2 = \frac{\sum_{i=1}^n \left(d_i -\bar{d}\right)^2}{n-1}$ $d_i = x_{i,1} - x_{i,2}$ $\bar{d}=\frac{\sum_{i=1}^n d_i}{n}$

Symbols used

$x_{i,1}$ , is the i-th score from the first variable
$x_{i,2}$ , is the i-th score from the second variable
$d_{H0}$ , difference according to null hypothesis (dmu parameter)
$\T\left(\dots\right)$ , cumulative density function of the Student t distribution.

References

Student. (1908). The probable error of a mean. Biometrika, 6(1), 1–25. doi:10.1093/biomet/6.1.1

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Expand source code

def ts_student_t_ps(field1, field2, dmu=0):
    '''
    Student t Test (Paired Samples)
    -------------------------------
    The assumption about the population (null hypothesis) for this test is a pre-defined difference between two means, usually zero (i.e. the difference between the (arithmetic) means is zero, they are the same in the population). If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.
    
    Parameters
    ----------
    field1 : pandas series
        the ordinal or scale scores of the first variable
    field2 : pandas series
        the ordinal or scale scores of the second variable
    dmu : float, optional 
        hypothesized difference. Default is zero
    
    Returns
    -------
    res : dataframe with 
    
    * *n*, the number of scores
    * *statistic*, the test statistic (t-value)
    * *df*, the degrees of freedom
    * *p-value*, significance (p-value)
    
    Notes
    -----
    The formula used is:
    $$t = \\frac{\\bar{d} - d_{H0}}{SE}$$
    $$df = n - 1$$
    $$sig. = 2\\times\\left(1 - T\\left(\\left|z\\right|, df\\right)\\right)$$
    
    With:
    $$\\bar{d} = \\bar{x}_1 - \\bar{x}_2 = \\frac{\\sum_{i=1}^n d_i}{n}$$
    $$SE = \\sqrt{\\frac{s_d^2}{n}}$$
    $$s_d^2 = \\frac{\\sum_{i=1}^n \\left(d_i -\\bar{d}\\right)^2}{n-1}$$
    $$d_i = x_{i,1} - x_{i,2}$$
    $$\\bar{d}=\\frac{\\sum_{i=1}^n d_i}{n}$$
    
    *Symbols used*
    
    * \\(x_{i,1}\\), is the i-th score from the first variable
    * \\(x_{i,2}\\), is the i-th score from the second variable
    * \\(d_{H0}\\), difference according to null hypothesis (dmu parameter)
    * \\(\\T\\left(\\dots\\right)\\), cumulative density function of the Student t distribution.
    
    References
    ----------
    Student. (1908). The probable error of a mean. *Biometrika, 6*(1), 1–25. doi:10.1093/biomet/6.1.1
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    
    '''
    if type(field1) == list:
        field1 = pd.Series(field1)
        
    if type(field2) == list:
        field2 = pd.Series(field2)
    
    data = pd.concat([field1, field2], axis=1)
    data.columns = ["field1", "field2"]
    #Remove rows with missing values and reset index
    data = data.dropna()    
    data.reset_index()
    
    #overall n
    n = len(data["field1"])
    
    data["diffs"] = data["field1"] - data["field2"]
    dsigma = data["diffs"].std()
    dm = data["diffs"].mean()
    se = dsigma/n**0.5
    tval = (dm - dmu)/se
    df = n - 1
    pvalue = 2 * (1 - t.cdf(abs(tval), df))
    
    res = pd.DataFrame([[n, tval, df, pvalue]])
    res.columns = ["n", "statistic", "df", "p-value"]
    
    return res