Module stikpetP.tests.test_z_os
Expand source code
from statistics import NormalDist
import pandas as pd
def ts_z_os(data, mu=None, sigma=None):
'''
Z Test (One-Sample)
-------------------
This test is often used if there is a large sample size. For smaller sample sizes, a Student t-test is usually used.
The assumption about the population (null hypothesis) for this test is a pre-defined mean, i.e. the (arithmetic) mean that is expected in the population. If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.
This function is shown in this [YouTube video](https://youtu.be/Fg9SgN7uUwM) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/zOneSample.html)
Parameters
----------
data : list or pandas data series
the data as numbers
mu : float, optional
hypothesized mean, otherwise the midrange will be used
sigma : float, optional
population standard deviation, if not set the sample results will be used
Returns
-------
pandas.DataFrame
A dataframe with the following columns:
* *mu*, the hypothesized mean
* *sample mean*, the sample mean
* *statistic*, the test statistic (z-value)
* *p-value*, the significance (p-value)
* *test used*, name of test used
Notes
-----
The formula used is:
$$z = \\frac{\\bar{x} - \\mu_{H_0}}{SE}$$
$$sig = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$
With:
$$SE = \\frac{\\sigma}{\\sqrt{n}}$$
$$\\sigma \\approx s = \\sqrt{\\frac{\\sum_{i=1}^n\\left(x_i - \\bar{x}\\right)^2}{n - 1}}$$
$$\\bar{x} = \\frac{\\sum_{i=1}^nx_i}{n}$$
*Symbols used:*
* $\\Phi\\left(\\dots\\right)$ the cumulative distribution function of the standard normal distribution
* $\\bar{x}$ the sample mean
* $\\mu_{H_0}$ the hypothesized mean in the population
* $SE$ the standard error (i.e. the standard deviation of the sampling distribution)
* $n$ the sample size (i.e. the number of scores)
* $s$ the unbiased sample standard deviation
* $x_i$ the i-th score
Before, After and Alternatives
------------------------------
Before this you might want to create a binned frequency table or a visualisation:
* [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins) to create a binned frequency table
* [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot
* [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
* [vi_stem_and_leaf](../visualisations/vis_stem_and_leaf.html#vi_stem_and_leaf) for a Stem-and-Leaf Display
After this you might want an effect size measure:
* [es_cohen_d_os](../effect_sizes/eff_size_cohen_d_os.html#es_cohen_d_os) for Cohen d'
* [es_hedges_g_os](../effect_sizes/eff_size_hedges_g_os.html#es_hedges_g_os) for Hedges g
* [es_common_language_os](../eff_size_common_language_os/meas_variation.html#es_common_language_os) for the Common Language Effect Size
Alternative Tests:
* [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test
* [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
---------
>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df2['Gen_Age']
>>> ts_z_os(ex1)
mu sample mean statistic p-value test used
0 68.5 24.454545 -19.291196 0.0 one-sample z
>>> ts_z_os(ex1, mu=22, sigma=12.1)
mu sample mean statistic p-value test used
0 22 24.454545 1.345588 0.178435 one-sample z
Example 2: Numeric list
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> ts_z_os(ex2)
mu sample mean statistic p-value test used
0 3.0 3.444444 1.19335 0.232732 one-sample z
'''
if type(data) is list:
data = pd.Series(data)
data = data.dropna()
if (mu is None):
mu = (min(data) + max(data))/2
n = len(data)
m = data.mean()
if (sigma is None):
s = data.std()
else:
s = sigma
se = s/n**0.5
z = (m - mu)/se
pValue = 2 * (1 - NormalDist().cdf(abs(z)))
testUsed = "one-sample z"
testResults = pd.DataFrame([[mu, m, z, pValue, testUsed]], columns=["mu", "sample mean", "statistic", "p-value", "test used"])
return (testResults)
Functions
def ts_z_os(data, mu=None, sigma=None)
-
Z Test (One-Sample)
This test is often used if there is a large sample size. For smaller sample sizes, a Student t-test is usually used.
The assumption about the population (null hypothesis) for this test is a pre-defined mean, i.e. the (arithmetic) mean that is expected in the population. If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.
This function is shown in this YouTube video and the test is also described at PeterStatistics.com
Parameters
data
:list
orpandas data series
- the data as numbers
mu
:float
, optional- hypothesized mean, otherwise the midrange will be used
sigma
:float
, optional- population standard deviation, if not set the sample results will be used
Returns
pandas.DataFrame
-
A dataframe with the following columns:
- mu, the hypothesized mean
- sample mean, the sample mean
- statistic, the test statistic (z-value)
- p-value, the significance (p-value)
- test used, name of test used
Notes
The formula used is: z = \frac{\bar{x} - \mu_{H_0}}{SE} sig = 2\times\left(1 - \Phi\left(\left|z\right|\right)\right)
With: SE = \frac{\sigma}{\sqrt{n}} \sigma \approx s = \sqrt{\frac{\sum_{i=1}^n\left(x_i - \bar{x}\right)^2}{n - 1}} \bar{x} = \frac{\sum_{i=1}^nx_i}{n}
Symbols used:
- $\Phi\left(\dots\right)$ the cumulative distribution function of the standard normal distribution
- $\bar{x}$ the sample mean
- $\mu_{H_0}$ the hypothesized mean in the population
- $SE$ the standard error (i.e. the standard deviation of the sampling distribution)
- $n$ the sample size (i.e. the number of scores)
- $s$ the unbiased sample standard deviation
- $x_i$ the i-th score
Before, After and Alternatives
Before this you might want to create a binned frequency table or a visualisation: * tab_frequency_bins to create a binned frequency table * vi_boxplot_single for a Box (and Whisker) Plot * vi_histogram for a Histogram * vi_stem_and_leaf for a Stem-and-Leaf Display
After this you might want an effect size measure: * es_cohen_d_os for Cohen d' * es_hedges_g_os for Hedges g * es_common_language_os for the Common Language Effect Size
Alternative Tests: * ts_student_t_os for One-Sample Student t-Test * ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df2['Gen_Age'] >>> ts_z_os(ex1) mu sample mean statistic p-value test used 0 68.5 24.454545 -19.291196 0.0 one-sample z >>> ts_z_os(ex1, mu=22, sigma=12.1) mu sample mean statistic p-value test used 0 22 24.454545 1.345588 0.178435 one-sample z
Example 2: Numeric list
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> ts_z_os(ex2) mu sample mean statistic p-value test used 0 3.0 3.444444 1.19335 0.232732 one-sample z
Expand source code
def ts_z_os(data, mu=None, sigma=None): ''' Z Test (One-Sample) ------------------- This test is often used if there is a large sample size. For smaller sample sizes, a Student t-test is usually used. The assumption about the population (null hypothesis) for this test is a pre-defined mean, i.e. the (arithmetic) mean that is expected in the population. If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected. This function is shown in this [YouTube video](https://youtu.be/Fg9SgN7uUwM) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/zOneSample.html) Parameters ---------- data : list or pandas data series the data as numbers mu : float, optional hypothesized mean, otherwise the midrange will be used sigma : float, optional population standard deviation, if not set the sample results will be used Returns ------- pandas.DataFrame A dataframe with the following columns: * *mu*, the hypothesized mean * *sample mean*, the sample mean * *statistic*, the test statistic (z-value) * *p-value*, the significance (p-value) * *test used*, name of test used Notes ----- The formula used is: $$z = \\frac{\\bar{x} - \\mu_{H_0}}{SE}$$ $$sig = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$ With: $$SE = \\frac{\\sigma}{\\sqrt{n}}$$ $$\\sigma \\approx s = \\sqrt{\\frac{\\sum_{i=1}^n\\left(x_i - \\bar{x}\\right)^2}{n - 1}}$$ $$\\bar{x} = \\frac{\\sum_{i=1}^nx_i}{n}$$ *Symbols used:* * $\\Phi\\left(\\dots\\right)$ the cumulative distribution function of the standard normal distribution * $\\bar{x}$ the sample mean * $\\mu_{H_0}$ the hypothesized mean in the population * $SE$ the standard error (i.e. the standard deviation of the sampling distribution) * $n$ the sample size (i.e. the number of scores) * $s$ the unbiased sample standard deviation * $x_i$ the i-th score Before, After and Alternatives ------------------------------ Before this you might want to create a binned frequency table or a visualisation: * [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins) to create a binned frequency table * [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot * [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram * [vi_stem_and_leaf](../visualisations/vis_stem_and_leaf.html#vi_stem_and_leaf) for a Stem-and-Leaf Display After this you might want an effect size measure: * [es_cohen_d_os](../effect_sizes/eff_size_cohen_d_os.html#es_cohen_d_os) for Cohen d' * [es_hedges_g_os](../effect_sizes/eff_size_hedges_g_os.html#es_hedges_g_os) for Hedges g * [es_common_language_os](../eff_size_common_language_os/meas_variation.html#es_common_language_os) for the Common Language Effect Size Alternative Tests: * [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test * [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples --------- >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) Example 1: pandas series >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df2['Gen_Age'] >>> ts_z_os(ex1) mu sample mean statistic p-value test used 0 68.5 24.454545 -19.291196 0.0 one-sample z >>> ts_z_os(ex1, mu=22, sigma=12.1) mu sample mean statistic p-value test used 0 22 24.454545 1.345588 0.178435 one-sample z Example 2: Numeric list >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> ts_z_os(ex2) mu sample mean statistic p-value test used 0 3.0 3.444444 1.19335 0.232732 one-sample z ''' if type(data) is list: data = pd.Series(data) data = data.dropna() if (mu is None): mu = (min(data) + max(data))/2 n = len(data) m = data.mean() if (sigma is None): s = data.std() else: s = sigma se = s/n**0.5 z = (m - mu)/se pValue = 2 * (1 - NormalDist().cdf(abs(z))) testUsed = "one-sample z" testResults = pd.DataFrame([[mu, m, z, pValue, testUsed]], columns=["mu", "sample mean", "statistic", "p-value", "test used"]) return (testResults)