Module stikpetP.tests.test_student_t_ps
Expand source code
import pandas as pd
from scipy.stats import t
def ts_student_t_ps(field1, field2, dmu=0):
'''
Student t Test (Paired Samples)
-------------------------------
The assumption about the population (null hypothesis) for this test is a pre-defined difference between two means, usually zero (i.e. the difference between the (arithmetic) means is zero, they are the same in the population). If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.
Parameters
----------
field1 : pandas series
the ordinal or scale scores of the first variable
field2 : pandas series
the ordinal or scale scores of the second variable
dmu : float, optional
hypothesized difference. Default is zero
Returns
-------
res : dataframe with
* *n*, the number of scores
* *statistic*, the test statistic (t-value)
* *df*, the degrees of freedom
* *p-value*, significance (p-value)
Notes
-----
The formula used is:
$$t = \\frac{\\bar{d} - d_{H0}}{SE}$$
$$df = n - 1$$
$$sig. = 2\\times\\left(1 - T\\left(\\left|z\\right|, df\\right)\\right)$$
With:
$$\\bar{d} = \\bar{x}_1 - \\bar{x}_2 = \\frac{\\sum_{i=1}^n d_i}{n}$$
$$SE = \\sqrt{\\frac{s_d^2}{n}}$$
$$s_d^2 = \\frac{\\sum_{i=1}^n \\left(d_i -\\bar{d}\\right)^2}{n-1}$$
$$d_i = x_{i,1} - x_{i,2}$$
$$\\bar{d}=\\frac{\\sum_{i=1}^n d_i}{n}$$
*Symbols used*
* \\(x_{i,1}\\), is the i-th score from the first variable
* \\(x_{i,2}\\), is the i-th score from the second variable
* \\(d_{H0}\\), difference according to null hypothesis (dmu parameter)
* \\(\\T\\left(\\dots\\right)\\), cumulative density function of the Student t distribution.
References
----------
Student. (1908). The probable error of a mean. *Biometrika, 6*(1), 1–25. doi:10.1093/biomet/6.1.1
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
'''
if type(field1) == list:
field1 = pd.Series(field1)
if type(field2) == list:
field2 = pd.Series(field2)
data = pd.concat([field1, field2], axis=1)
data.columns = ["field1", "field2"]
#Remove rows with missing values and reset index
data = data.dropna()
data.reset_index()
#overall n
n = len(data["field1"])
data["diffs"] = data["field1"] - data["field2"]
dsigma = data["diffs"].std()
dm = data["diffs"].mean()
se = dsigma/n**0.5
tval = (dm - dmu)/se
df = n - 1
pvalue = 2 * (1 - t.cdf(abs(tval), df))
res = pd.DataFrame([[n, tval, df, pvalue]])
res.columns = ["n", "statistic", "df", "p-value"]
return res
Functions
def ts_student_t_ps(field1, field2, dmu=0)-
Student t Test (Paired Samples)
The assumption about the population (null hypothesis) for this test is a pre-defined difference between two means, usually zero (i.e. the difference between the (arithmetic) means is zero, they are the same in the population). If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.
Parameters
field1:pandas series- the ordinal or scale scores of the first variable
field2:pandas series- the ordinal or scale scores of the second variable
dmu:float, optional- hypothesized difference. Default is zero
Returns
res:dataframe with
- n, the number of scores
- statistic, the test statistic (t-value)
- df, the degrees of freedom
- p-value, significance (p-value)
Notes
The formula used is: t = \frac{\bar{d} - d_{H0}}{SE} df = n - 1 sig. = 2\times\left(1 - T\left(\left|z\right|, df\right)\right)
With: \bar{d} = \bar{x}_1 - \bar{x}_2 = \frac{\sum_{i=1}^n d_i}{n} SE = \sqrt{\frac{s_d^2}{n}} s_d^2 = \frac{\sum_{i=1}^n \left(d_i -\bar{d}\right)^2}{n-1} d_i = x_{i,1} - x_{i,2} \bar{d}=\frac{\sum_{i=1}^n d_i}{n}
Symbols used
- x_{i,1}, is the i-th score from the first variable
- x_{i,2}, is the i-th score from the second variable
- d_{H0}, difference according to null hypothesis (dmu parameter)
- \T\left(\dots\right), cumulative density function of the Student t distribution.
References
Student. (1908). The probable error of a mean. Biometrika, 6(1), 1–25. doi:10.1093/biomet/6.1.1
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Expand source code
def ts_student_t_ps(field1, field2, dmu=0): ''' Student t Test (Paired Samples) ------------------------------- The assumption about the population (null hypothesis) for this test is a pre-defined difference between two means, usually zero (i.e. the difference between the (arithmetic) means is zero, they are the same in the population). If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected. Parameters ---------- field1 : pandas series the ordinal or scale scores of the first variable field2 : pandas series the ordinal or scale scores of the second variable dmu : float, optional hypothesized difference. Default is zero Returns ------- res : dataframe with * *n*, the number of scores * *statistic*, the test statistic (t-value) * *df*, the degrees of freedom * *p-value*, significance (p-value) Notes ----- The formula used is: $$t = \\frac{\\bar{d} - d_{H0}}{SE}$$ $$df = n - 1$$ $$sig. = 2\\times\\left(1 - T\\left(\\left|z\\right|, df\\right)\\right)$$ With: $$\\bar{d} = \\bar{x}_1 - \\bar{x}_2 = \\frac{\\sum_{i=1}^n d_i}{n}$$ $$SE = \\sqrt{\\frac{s_d^2}{n}}$$ $$s_d^2 = \\frac{\\sum_{i=1}^n \\left(d_i -\\bar{d}\\right)^2}{n-1}$$ $$d_i = x_{i,1} - x_{i,2}$$ $$\\bar{d}=\\frac{\\sum_{i=1}^n d_i}{n}$$ *Symbols used* * \\(x_{i,1}\\), is the i-th score from the first variable * \\(x_{i,2}\\), is the i-th score from the second variable * \\(d_{H0}\\), difference according to null hypothesis (dmu parameter) * \\(\\T\\left(\\dots\\right)\\), cumulative density function of the Student t distribution. References ---------- Student. (1908). The probable error of a mean. *Biometrika, 6*(1), 1–25. doi:10.1093/biomet/6.1.1 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 ''' if type(field1) == list: field1 = pd.Series(field1) if type(field2) == list: field2 = pd.Series(field2) data = pd.concat([field1, field2], axis=1) data.columns = ["field1", "field2"] #Remove rows with missing values and reset index data = data.dropna() data.reset_index() #overall n n = len(data["field1"]) data["diffs"] = data["field1"] - data["field2"] dsigma = data["diffs"].std() dm = data["diffs"].mean() se = dsigma/n**0.5 tval = (dm - dmu)/se df = n - 1 pvalue = 2 * (1 - t.cdf(abs(tval), df)) res = pd.DataFrame([[n, tval, df, pvalue]]) res.columns = ["n", "statistic", "df", "p-value"] return res