Module stikpetP.tests.test_student_t_os
Expand source code
from scipy.stats import t
import pandas as pd
def ts_student_t_os(data, mu=None):
'''
One-Sample Student t-Test
-------------------------
A test for a single (arithmetic) mean.
The assumption about the population (null hypothesis) for this test is a pre-defined mean, i.e. the (arithmetic) mean that is expected in the population. If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.
This function is shown in this [YouTube video](https://youtu.be/XEao_UFs1g8) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/tOneSample.html)
Parameters
----------
data : list or pandas data series
the data as numbers
mu : float, optional
hypothesized mean, otherwise the midrange will be used
Returns
-------
Returns
-------
pandas.DataFrame
A dataframe with the following columns:
* *mu*, the hypothesized mean
* *sample mean*, the sample mean
* *statistic*, the test statistic (t-value)
* *df*, the degrees of freedom
* *p-value*, the significance (p-value)
* *test used*, name of test used
Notes
-----
The formula used is:
$$t = \\frac{\\bar{x} - \\mu_{H_0}}{SE}$$
$$sig = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$
With:
$$df = n - 1$$
$$SE = \\frac{s}{\\sqrt{n}}$$
$$s = \\sqrt{\\frac{\\sum_{i=1}^n\\left(x_i - \\bar{x}\\right)^2}{n - 1}}$$
$$\\bar{x} = \\frac{\\sum_{i=1}^n x_i}{n}$$
*Symbols used:*
* $T\\left(\\dots, \\dots\\right)$ the cumulative distribution function of the t-distribution
* $\\bar{x}$ the sample mean
* $\\mu_{H_0}$ the hypothesized mean in the population
* $SE$ the standard error (i.e. the standard deviation of the sampling distribution)
* $n$ the sample size (i.e. the number of scores)
* $s$ the unbiased sample standard deviation
* $x_i$ the i-th score
The Student t test (Student, 1908) was described by Gosset under the pseudo name Student.
Before, After and Alternatives
------------------------------
Before this you might want to create a binned frequency table or a visualisation:
* [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins) to create a binned frequency table
* [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot
* [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
* [vi_stem_and_leaf](../visualisations/vis_stem_and_leaf.html#vi_stem_and_leaf) for a Stem-and-Leaf Display
After this you might want an effect size measure:
* [es_cohen_d_os](../effect_sizes/eff_size_cohen_d_os.html#es_cohen_d_os) for Cohen d'
* [es_hedges_g_os](../effect_sizes/eff_size_hedges_g_os.html#es_hedges_g_os) for Hedges g
* [es_common_language_os](../eff_size_common_language_os/meas_variation.html#es_common_language_os) for the Common Language Effect Size
Alternative Tests:
* [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test
* [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test
References
----------
Student. (1908). The probable error of a mean. *Biometrika, 6*(1), 1–25. doi:10.1093/biomet/6.1.1
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
---------
>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df2['Gen_Age']
>>> ts_student_t_os(ex1)
mu sample mean statistic df p-value test used
0 68.5 24.454545 -19.291196 43 0.0 one-sample Student t
>>> ts_student_t_os(ex1, mu=22)
mu sample mean statistic df p-value test used
0 22 24.454545 1.075051 43 0.288347 one-sample Student t
Example 2: Numeric list
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> ts_student_t_os(ex2)
mu sample mean statistic df p-value test used
0 3.0 3.444444 1.19335 17 0.249121 one-sample Student t
'''
if type(data) is list:
data = pd.Series(data)
data = data.dropna()
if (mu is None):
mu = (min(data) + max(data))/2
n = len(data)
m = data.mean()
s = data.std()
se = s/n**0.5
tValue = (m - mu)/se
df = n - 1
pValue = 2 * (1 - t.cdf(abs(tValue), df))
testUsed = "one-sample Student t"
testResults = pd.DataFrame([[mu, m, tValue, df, pValue, testUsed]], columns=["mu", "sample mean", "statistic", "df", "p-value", "test used"])
return (testResults)
Functions
def ts_student_t_os(data, mu=None)
-
One-Sample Student t-Test
A test for a single (arithmetic) mean.
The assumption about the population (null hypothesis) for this test is a pre-defined mean, i.e. the (arithmetic) mean that is expected in the population. If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.
This function is shown in this YouTube video and the test is also described at PeterStatistics.com
Parameters
data
:list
orpandas data series
- the data as numbers
mu
:float
, optional- hypothesized mean, otherwise the midrange will be used
Returns
Returns
pandas.DataFrame
-
A dataframe with the following columns:
- mu, the hypothesized mean
- sample mean, the sample mean
- statistic, the test statistic (t-value)
- df, the degrees of freedom
- p-value, the significance (p-value)
- test used, name of test used
Notes
The formula used is: t = \frac{\bar{x} - \mu_{H_0}}{SE} sig = 2\times\left(1 - T\left(\left|t\right|, df\right)\right)
With: df = n - 1 SE = \frac{s}{\sqrt{n}} s = \sqrt{\frac{\sum_{i=1}^n\left(x_i - \bar{x}\right)^2}{n - 1}} \bar{x} = \frac{\sum_{i=1}^n x_i}{n}
Symbols used:
- $T\left(\dots, \dots\right)$ the cumulative distribution function of the t-distribution
- $\bar{x}$ the sample mean
- $\mu_{H_0}$ the hypothesized mean in the population
- $SE$ the standard error (i.e. the standard deviation of the sampling distribution)
- $n$ the sample size (i.e. the number of scores)
- $s$ the unbiased sample standard deviation
- $x_i$ the i-th score
The Student t test (Student, 1908) was described by Gosset under the pseudo name Student.
Before, After and Alternatives
Before this you might want to create a binned frequency table or a visualisation: * tab_frequency_bins to create a binned frequency table * vi_boxplot_single for a Box (and Whisker) Plot * vi_histogram for a Histogram * vi_stem_and_leaf for a Stem-and-Leaf Display
After this you might want an effect size measure: * es_cohen_d_os for Cohen d' * es_hedges_g_os for Hedges g * es_common_language_os for the Common Language Effect Size
Alternative Tests: * ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test * ts_z_os for One-Sample Z Test
References
Student. (1908). The probable error of a mean. Biometrika, 6(1), 1–25. doi:10.1093/biomet/6.1.1
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df2['Gen_Age'] >>> ts_student_t_os(ex1) mu sample mean statistic df p-value test used 0 68.5 24.454545 -19.291196 43 0.0 one-sample Student t >>> ts_student_t_os(ex1, mu=22) mu sample mean statistic df p-value test used 0 22 24.454545 1.075051 43 0.288347 one-sample Student t
Example 2: Numeric list
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> ts_student_t_os(ex2) mu sample mean statistic df p-value test used 0 3.0 3.444444 1.19335 17 0.249121 one-sample Student t
Expand source code
def ts_student_t_os(data, mu=None): ''' One-Sample Student t-Test ------------------------- A test for a single (arithmetic) mean. The assumption about the population (null hypothesis) for this test is a pre-defined mean, i.e. the (arithmetic) mean that is expected in the population. If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected. This function is shown in this [YouTube video](https://youtu.be/XEao_UFs1g8) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/tOneSample.html) Parameters ---------- data : list or pandas data series the data as numbers mu : float, optional hypothesized mean, otherwise the midrange will be used Returns ------- Returns ------- pandas.DataFrame A dataframe with the following columns: * *mu*, the hypothesized mean * *sample mean*, the sample mean * *statistic*, the test statistic (t-value) * *df*, the degrees of freedom * *p-value*, the significance (p-value) * *test used*, name of test used Notes ----- The formula used is: $$t = \\frac{\\bar{x} - \\mu_{H_0}}{SE}$$ $$sig = 2\\times\\left(1 - T\\left(\\left|t\\right|, df\\right)\\right)$$ With: $$df = n - 1$$ $$SE = \\frac{s}{\\sqrt{n}}$$ $$s = \\sqrt{\\frac{\\sum_{i=1}^n\\left(x_i - \\bar{x}\\right)^2}{n - 1}}$$ $$\\bar{x} = \\frac{\\sum_{i=1}^n x_i}{n}$$ *Symbols used:* * $T\\left(\\dots, \\dots\\right)$ the cumulative distribution function of the t-distribution * $\\bar{x}$ the sample mean * $\\mu_{H_0}$ the hypothesized mean in the population * $SE$ the standard error (i.e. the standard deviation of the sampling distribution) * $n$ the sample size (i.e. the number of scores) * $s$ the unbiased sample standard deviation * $x_i$ the i-th score The Student t test (Student, 1908) was described by Gosset under the pseudo name Student. Before, After and Alternatives ------------------------------ Before this you might want to create a binned frequency table or a visualisation: * [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins) to create a binned frequency table * [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot * [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram * [vi_stem_and_leaf](../visualisations/vis_stem_and_leaf.html#vi_stem_and_leaf) for a Stem-and-Leaf Display After this you might want an effect size measure: * [es_cohen_d_os](../effect_sizes/eff_size_cohen_d_os.html#es_cohen_d_os) for Cohen d' * [es_hedges_g_os](../effect_sizes/eff_size_hedges_g_os.html#es_hedges_g_os) for Hedges g * [es_common_language_os](../eff_size_common_language_os/meas_variation.html#es_common_language_os) for the Common Language Effect Size Alternative Tests: * [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test * [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test References ---------- Student. (1908). The probable error of a mean. *Biometrika, 6*(1), 1–25. doi:10.1093/biomet/6.1.1 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples --------- >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) Example 1: pandas series >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df2['Gen_Age'] >>> ts_student_t_os(ex1) mu sample mean statistic df p-value test used 0 68.5 24.454545 -19.291196 43 0.0 one-sample Student t >>> ts_student_t_os(ex1, mu=22) mu sample mean statistic df p-value test used 0 22 24.454545 1.075051 43 0.288347 one-sample Student t Example 2: Numeric list >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> ts_student_t_os(ex2) mu sample mean statistic df p-value test used 0 3.0 3.444444 1.19335 17 0.249121 one-sample Student t ''' if type(data) is list: data = pd.Series(data) data = data.dropna() if (mu is None): mu = (min(data) + max(data))/2 n = len(data) m = data.mean() s = data.std() se = s/n**0.5 tValue = (m - mu)/se df = n - 1 pValue = 2 * (1 - t.cdf(abs(tValue), df)) testUsed = "one-sample Student t" testResults = pd.DataFrame([[mu, m, tValue, df, pValue, testUsed]], columns=["mu", "sample mean", "statistic", "df", "p-value", "test used"]) return (testResults)