Module stikpetP.tests.test_sign_os
Expand source code
import pandas as pd
from scipy.stats import binom
def ts_sign_os(data, levels=None, mu = None):
'''
One-Sample Sign Test
--------------------
This function will perform one-sample sign test.
This function is shown in this [YouTube video](https://youtu.be/iEEFdHB3qhU) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/SignOneSample.html)
Parameters
----------
data : list or pandas data series
the data
levels : dictionary, optional
the categories and numeric value to use
mu : float, optional
hypothesized median. Default is the midrange of the data
Returns
-------
pandas.DataFrame
A dataframe with the following columns:
- *mu* : the used hypothesized median.
- *p-value* : the significance (p-value)
- *test* : description of the test used
Notes
-----
this uses the binom function from scipy.stats for the binomial distribution cdf.
The test statistic is calculated using (Stewart, 1941, p. 236):
$$p = 2\\times B\\left(n, \\text{min}\\left(n_+, n_-\\right), \\frac{1}{2}\\right)$$
*Symbols used:*
* $B\\left(\\dots\\right)$ is the binomial cumulative distribution function
* $n$ is the number of cases
* $n_+$ is the number of cases above the hypothesized median
* $n_-$ is the number of cases below the hypothesized median
* $min$ is the minimum value of the two values
The test is described in Stewart (1941), although there are earlier uses. The paired version for example was already described by Arbuthnott (1710)
Before, After and Alternatives
------------------------------
Before this test you might want an impression using a frequency table or a visualisation:
* [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
* [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
* [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart
After this you might want to determine an effect size measure:
* [es_common_language_os](../effect_sizes/eff_size_common_language_os.html#es_common_language_os) for the Common Language Effect Size
* [es_dominance](../effect_sizes/eff_size_dominance.html#es_dominance) for the Dominance score
* [r_rank_biserial_os](../correlations/cor_rank_biserial_os.html#r_rank_biserial_os) for the Rank-Biserial Correlation
Alternative tests:
* [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
* [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
References
----------
Arbuthnott, J. (1710). An argument for divine providence, taken from the constant regularity observ’d in the births of both sexes. *Philosophical Transactions of the Royal Society of London, 27*(328), 186–190. doi:10.1098/rstl.1710.0011
Stewart, W. M. (1941). A note on the power of the sign test. *The Annals of Mathematical Statistics, 12*(2), 236–239. doi:10.1214/aoms/1177731755
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
---------
>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df2['Teach_Motivate']
>>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
>>> ts_sign_os(ex1, levels=order)
mu p-value test
0 3.0 0.01952 one-sample sign test
Example 2: Numeric data
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> ts_sign_os(ex2)
mu p-value test
0 3.0 0.454498 one-sample sign test
'''
if type(data) is list:
data = pd.Series(data)
#remove missing values
data = data.dropna()
if levels is not None:
data = data.map(levels).astype('Int8')
else:
data = pd.to_numeric(data)
data = data.sort_values()
#set hypothesized median to mid range if not provided
if (mu is None):
mu = (min(data) + max(data)) / 2
#Determine count of cases below hypothesized median
group1 = data[data<mu]
group2 = data[data>mu]
n1 = len(group1)
n2 = len(group2)
#Select the lowest of the two
myMin = min(n1,n2)
#Determine total number of cases (unequal to hyp. median)
n = n1+n2
#Determine the significance using binomial test
pVal = 2*binom.cdf(myMin, n,0.5)
if pVal > 1:
pVal = 1
testUsed = "one-sample sign test"
testResults = pd.DataFrame([[mu, pVal, testUsed]], columns=["mu", "p-value", "test"])
pd.set_option('display.max_colwidth', None)
return(testResults)
Functions
def ts_sign_os(data, levels=None, mu=None)
-
One-Sample Sign Test
This function will perform one-sample sign test.
This function is shown in this YouTube video and the test is also described at PeterStatistics.com
Parameters
data
:list
orpandas data series
- the data
levels
:dictionary
, optional- the categories and numeric value to use
mu
:float
, optional- hypothesized median. Default is the midrange of the data
Returns
pandas.DataFrame
-
A dataframe with the following columns:
- mu : the used hypothesized median.
- p-value : the significance (p-value)
- test : description of the test used
Notes
this uses the binom function from scipy.stats for the binomial distribution cdf.
The test statistic is calculated using (Stewart, 1941, p. 236): p = 2\times B\left(n, \text{min}\left(n_+, n_-\right), \frac{1}{2}\right)
Symbols used:
- $B\left(\dots\right)$ is the binomial cumulative distribution function
- $n$ is the number of cases
- $n_+$ is the number of cases above the hypothesized median
- $n_-$ is the number of cases below the hypothesized median
- $min$ is the minimum value of the two values
The test is described in Stewart (1941), although there are earlier uses. The paired version for example was already described by Arbuthnott (1710)
Before, After and Alternatives
Before this test you might want an impression using a frequency table or a visualisation: * tab_frequency for a frequency table * vi_bar_stacked_single for Single Stacked Bar-Chart * vi_bar_dual_axis for Dual-Axis Bar Chart
After this you might want to determine an effect size measure: * es_common_language_os for the Common Language Effect Size * es_dominance for the Dominance score * r_rank_biserial_os for the Rank-Biserial Correlation
Alternative tests: * ts_trinomial_os for One-Sample Trinomial Test * ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)
References
Arbuthnott, J. (1710). An argument for divine providence, taken from the constant regularity observ’d in the births of both sexes. Philosophical Transactions of the Royal Society of London, 27(328), 186–190. doi:10.1098/rstl.1710.0011
Stewart, W. M. (1941). A note on the power of the sign test. The Annals of Mathematical Statistics, 12(2), 236–239. doi:10.1214/aoms/1177731755
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df2['Teach_Motivate'] >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5} >>> ts_sign_os(ex1, levels=order) mu p-value test 0 3.0 0.01952 one-sample sign test
Example 2: Numeric data
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> ts_sign_os(ex2) mu p-value test 0 3.0 0.454498 one-sample sign test
Expand source code
def ts_sign_os(data, levels=None, mu = None): ''' One-Sample Sign Test -------------------- This function will perform one-sample sign test. This function is shown in this [YouTube video](https://youtu.be/iEEFdHB3qhU) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/SignOneSample.html) Parameters ---------- data : list or pandas data series the data levels : dictionary, optional the categories and numeric value to use mu : float, optional hypothesized median. Default is the midrange of the data Returns ------- pandas.DataFrame A dataframe with the following columns: - *mu* : the used hypothesized median. - *p-value* : the significance (p-value) - *test* : description of the test used Notes ----- this uses the binom function from scipy.stats for the binomial distribution cdf. The test statistic is calculated using (Stewart, 1941, p. 236): $$p = 2\\times B\\left(n, \\text{min}\\left(n_+, n_-\\right), \\frac{1}{2}\\right)$$ *Symbols used:* * $B\\left(\\dots\\right)$ is the binomial cumulative distribution function * $n$ is the number of cases * $n_+$ is the number of cases above the hypothesized median * $n_-$ is the number of cases below the hypothesized median * $min$ is the minimum value of the two values The test is described in Stewart (1941), although there are earlier uses. The paired version for example was already described by Arbuthnott (1710) Before, After and Alternatives ------------------------------ Before this test you might want an impression using a frequency table or a visualisation: * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart After this you might want to determine an effect size measure: * [es_common_language_os](../effect_sizes/eff_size_common_language_os.html#es_common_language_os) for the Common Language Effect Size * [es_dominance](../effect_sizes/eff_size_dominance.html#es_dominance) for the Dominance score * [r_rank_biserial_os](../correlations/cor_rank_biserial_os.html#r_rank_biserial_os) for the Rank-Biserial Correlation Alternative tests: * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample) References ---------- Arbuthnott, J. (1710). An argument for divine providence, taken from the constant regularity observ’d in the births of both sexes. *Philosophical Transactions of the Royal Society of London, 27*(328), 186–190. doi:10.1098/rstl.1710.0011 Stewart, W. M. (1941). A note on the power of the sign test. *The Annals of Mathematical Statistics, 12*(2), 236–239. doi:10.1214/aoms/1177731755 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples --------- >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) Example 1: pandas series >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df2['Teach_Motivate'] >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5} >>> ts_sign_os(ex1, levels=order) mu p-value test 0 3.0 0.01952 one-sample sign test Example 2: Numeric data >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> ts_sign_os(ex2) mu p-value test 0 3.0 0.454498 one-sample sign test ''' if type(data) is list: data = pd.Series(data) #remove missing values data = data.dropna() if levels is not None: data = data.map(levels).astype('Int8') else: data = pd.to_numeric(data) data = data.sort_values() #set hypothesized median to mid range if not provided if (mu is None): mu = (min(data) + max(data)) / 2 #Determine count of cases below hypothesized median group1 = data[data<mu] group2 = data[data>mu] n1 = len(group1) n2 = len(group2) #Select the lowest of the two myMin = min(n1,n2) #Determine total number of cases (unequal to hyp. median) n = n1+n2 #Determine the significance using binomial test pVal = 2*binom.cdf(myMin, n,0.5) if pVal > 1: pVal = 1 testUsed = "one-sample sign test" testResults = pd.DataFrame([[mu, pVal, testUsed]], columns=["mu", "p-value", "test"]) pd.set_option('display.max_colwidth', None) return(testResults)