Module `stikpetP.tests.test_z_os`

Expand source code

from statistics import NormalDist
import pandas as pd

def ts_z_os(data, mu=None, sigma=None):
    '''
    Z Test (One-Sample)
    -------------------    
    This test is often used if there is a large sample size. For smaller sample sizes, a Student t-test is usually used.
    
    The assumption about the population (null hypothesis) for this test is a pre-defined mean, i.e. the (arithmetic) mean that is expected in the population. If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.

    This function is shown in this [YouTube video](https://youtu.be/Fg9SgN7uUwM) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/zOneSample.html)
    
    Parameters
    ----------
    data : list or pandas data series
        the data as numbers
    mu : float, optional
        hypothesized mean, otherwise the midrange will be used
    sigma : float, optional 
        population standard deviation, if not set the sample results will be used
    
    Returns
    -------
    pandas.DataFrame
        A dataframe with the following columns:
    
        * *mu*, the hypothesized mean
        * *sample mean*, the sample mean
        * *statistic*, the test statistic (z-value)
        * *p-value*, the significance (p-value)
        * *test used*, name of test used
    
    Notes
    -----
    The formula used is:
    $$z = \\frac{\\bar{x} - \\mu_{H_0}}{SE}$$
    $$sig = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$
    
    With:
    $$SE = \\frac{\\sigma}{\\sqrt{n}}$$
    $$\\sigma \\approx s = \\sqrt{\\frac{\\sum_{i=1}^n\\left(x_i - \\bar{x}\\right)^2}{n - 1}}$$
    $$\\bar{x} = \\frac{\\sum_{i=1}^nx_i}{n}$$
    
    *Symbols used:*
    
    * $\\Phi\\left(\\dots\\right)$ the cumulative distribution function of the standard normal distribution
    * $\\bar{x}$ the sample mean
    * $\\mu_{H_0}$ the hypothesized mean in the population
    * $SE$ the standard error (i.e. the standard deviation of the sampling distribution)
    * $n$ the sample size (i.e. the number of scores)
    * $s$ the unbiased sample standard deviation
    * $x_i$ the i-th score

    Before, After and Alternatives
    ------------------------------
    Before this you might want to create a binned frequency table or a visualisation:
    * [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins) to create a binned frequency table
    * [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot
    * [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
    * [vi_stem_and_leaf](../visualisations/vis_stem_and_leaf.html#vi_stem_and_leaf) for a Stem-and-Leaf Display

    After this you might want an effect size measure:
    * [es_cohen_d_os](../effect_sizes/eff_size_cohen_d_os.html#es_cohen_d_os) for Cohen d'
    * [es_hedges_g_os](../effect_sizes/eff_size_hedges_g_os.html#es_hedges_g_os) for Hedges g
    * [es_common_language_os](../eff_size_common_language_os/meas_variation.html#es_common_language_os) for the Common Language Effect Size
    
    Alternative Tests:
    * [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test
    * [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    ---------
    >>> pd.set_option('display.width',1000)
    >>> pd.set_option('display.max_columns', 1000)
    
    Example 1: pandas series
    >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df2['Gen_Age']
    >>> ts_z_os(ex1)
         mu  sample mean  statistic  p-value     test used
    0  68.5    24.454545 -19.291196      0.0  one-sample z
    >>> ts_z_os(ex1, mu=22, sigma=12.1)
       mu  sample mean  statistic   p-value     test used
    0  22    24.454545   1.345588  0.178435  one-sample z
    
    Example 2: Numeric list
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> ts_z_os(ex2)
        mu  sample mean  statistic   p-value     test used
    0  3.0     3.444444    1.19335  0.232732  one-sample z

    '''
    if type(data) is list:
        data = pd.Series(data)
        
    data = data.dropna()
    
    if (mu is None):
        mu = (min(data) + max(data))/2
    
    n = len(data)
    m = data.mean()
    if (sigma is None):
        s = data.std()
    else:
        s = sigma
        
    se = s/n**0.5
    z = (m - mu)/se
    pValue = 2 * (1 - NormalDist().cdf(abs(z))) 
    
    testUsed = "one-sample z"
    testResults = pd.DataFrame([[mu, m, z, pValue, testUsed]], columns=["mu", "sample mean", "statistic", "p-value", "test used"])
    
    return (testResults)

Functions

def ts_z_os(data, mu=None, sigma=None)

Z Test (One-Sample)

This test is often used if there is a large sample size. For smaller sample sizes, a Student t-test is usually used.

The assumption about the population (null hypothesis) for this test is a pre-defined mean, i.e. the (arithmetic) mean that is expected in the population. If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.

This function is shown in this YouTube video and the test is also described at PeterStatistics.com

Parameters

data : list or pandas data series: the data as numbers
mu : float, optional: hypothesized mean, otherwise the midrange will be used
sigma : float, optional: population standard deviation, if not set the sample results will be used

Returns

pandas.DataFrame

A dataframe with the following columns:

mu, the hypothesized mean
sample mean, the sample mean
statistic, the test statistic (z-value)
p-value, the significance (p-value)
test used, name of test used

Notes

The formula used is: $z = \frac{\bar{x} - \mu_{H_0}}{SE}$ $sig = 2\times\left(1 - \Phi\left(\left|z\right|\right)\right)$

With: $SE = \frac{\sigma}{\sqrt{n}}$ $\sigma \approx s = \sqrt{\frac{\sum_{i=1}^n\left(x_i - \bar{x}\right)^2}{n - 1}}$ $\bar{x} = \frac{\sum_{i=1}^nx_i}{n}$

Symbols used:

$\Phi\left(\dots\right)$ the cumulative distribution function of the standard normal distribution
$\bar{x}$ the sample mean
$\mu_{H_0}$ the hypothesized mean in the population
$SE$ the standard error (i.e. the standard deviation of the sampling distribution)
$n$ the sample size (i.e. the number of scores)
$s$ the unbiased sample standard deviation
$x_i$ the i-th score

Before, After and Alternatives

Before this you might want to create a binned frequency table or a visualisation: * tab_frequency_bins to create a binned frequency table * vi_boxplot_single for a Box (and Whisker) Plot * vi_histogram for a Histogram * vi_stem_and_leaf for a Stem-and-Leaf Display

After this you might want an effect size measure: * es_cohen_d_os for Cohen d' * es_hedges_g_os for Hedges g * es_common_language_os for the Common Language Effect Size

Alternative Tests: * ts_student_t_os for One-Sample Student t-Test * ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)

Example 1: pandas series

>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df2['Gen_Age']
>>> ts_z_os(ex1)
     mu  sample mean  statistic  p-value     test used
0  68.5    24.454545 -19.291196      0.0  one-sample z
>>> ts_z_os(ex1, mu=22, sigma=12.1)
   mu  sample mean  statistic   p-value     test used
0  22    24.454545   1.345588  0.178435  one-sample z

Example 2: Numeric list

>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> ts_z_os(ex2)
    mu  sample mean  statistic   p-value     test used
0  3.0     3.444444    1.19335  0.232732  one-sample z

Expand source code

def ts_z_os(data, mu=None, sigma=None):
    '''
    Z Test (One-Sample)
    -------------------    
    This test is often used if there is a large sample size. For smaller sample sizes, a Student t-test is usually used.
    
    The assumption about the population (null hypothesis) for this test is a pre-defined mean, i.e. the (arithmetic) mean that is expected in the population. If the p-value (significance) is then below a pre-defined threhold (usually 0.05), the assumption is rejected.

    This function is shown in this [YouTube video](https://youtu.be/Fg9SgN7uUwM) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/zOneSample.html)
    
    Parameters
    ----------
    data : list or pandas data series
        the data as numbers
    mu : float, optional
        hypothesized mean, otherwise the midrange will be used
    sigma : float, optional 
        population standard deviation, if not set the sample results will be used
    
    Returns
    -------
    pandas.DataFrame
        A dataframe with the following columns:
    
        * *mu*, the hypothesized mean
        * *sample mean*, the sample mean
        * *statistic*, the test statistic (z-value)
        * *p-value*, the significance (p-value)
        * *test used*, name of test used
    
    Notes
    -----
    The formula used is:
    $$z = \\frac{\\bar{x} - \\mu_{H_0}}{SE}$$
    $$sig = 2\\times\\left(1 - \\Phi\\left(\\left|z\\right|\\right)\\right)$$
    
    With:
    $$SE = \\frac{\\sigma}{\\sqrt{n}}$$
    $$\\sigma \\approx s = \\sqrt{\\frac{\\sum_{i=1}^n\\left(x_i - \\bar{x}\\right)^2}{n - 1}}$$
    $$\\bar{x} = \\frac{\\sum_{i=1}^nx_i}{n}$$
    
    *Symbols used:*
    
    * $\\Phi\\left(\\dots\\right)$ the cumulative distribution function of the standard normal distribution
    * $\\bar{x}$ the sample mean
    * $\\mu_{H_0}$ the hypothesized mean in the population
    * $SE$ the standard error (i.e. the standard deviation of the sampling distribution)
    * $n$ the sample size (i.e. the number of scores)
    * $s$ the unbiased sample standard deviation
    * $x_i$ the i-th score

    Before, After and Alternatives
    ------------------------------
    Before this you might want to create a binned frequency table or a visualisation:
    * [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins) to create a binned frequency table
    * [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot
    * [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
    * [vi_stem_and_leaf](../visualisations/vis_stem_and_leaf.html#vi_stem_and_leaf) for a Stem-and-Leaf Display

    After this you might want an effect size measure:
    * [es_cohen_d_os](../effect_sizes/eff_size_cohen_d_os.html#es_cohen_d_os) for Cohen d'
    * [es_hedges_g_os](../effect_sizes/eff_size_hedges_g_os.html#es_hedges_g_os) for Hedges g
    * [es_common_language_os](../eff_size_common_language_os/meas_variation.html#es_common_language_os) for the Common Language Effect Size
    
    Alternative Tests:
    * [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test
    * [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    ---------
    >>> pd.set_option('display.width',1000)
    >>> pd.set_option('display.max_columns', 1000)
    
    Example 1: pandas series
    >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df2['Gen_Age']
    >>> ts_z_os(ex1)
         mu  sample mean  statistic  p-value     test used
    0  68.5    24.454545 -19.291196      0.0  one-sample z
    >>> ts_z_os(ex1, mu=22, sigma=12.1)
       mu  sample mean  statistic   p-value     test used
    0  22    24.454545   1.345588  0.178435  one-sample z
    
    Example 2: Numeric list
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> ts_z_os(ex2)
        mu  sample mean  statistic   p-value     test used
    0  3.0     3.444444    1.19335  0.232732  one-sample z

    '''
    if type(data) is list:
        data = pd.Series(data)
        
    data = data.dropna()
    
    if (mu is None):
        mu = (min(data) + max(data))/2
    
    n = len(data)
    m = data.mean()
    if (sigma is None):
        s = data.std()
    else:
        s = sigma
        
    se = s/n**0.5
    z = (m - mu)/se
    pValue = 2 * (1 - NormalDist().cdf(abs(z))) 
    
    testUsed = "one-sample z"
    testResults = pd.DataFrame([[mu, m, z, pValue, testUsed]], columns=["mu", "sample mean", "statistic", "p-value", "test used"])
    
    return (testResults)