Module stikpetP.tests.test_trinomial_os
Expand source code
import pandas as pd
import math
from ..distributions.dist_multinomial import di_mpmf
def ts_trinomial_os(data, levels=None, mu = None):
'''
One-Sample Trinomial Test
-------------------------
A test that could be used with ordinal data that includes ties
Similar as a sign-test but instead of ignoring scores that are tied with the hypothesized median they get included, hence instead of the binomial distribution, this will use the trinomial distribution.
This function is shown in this [YouTube video](https://youtu.be/6kTZkgkEqHw) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/WilcoxonSignedRankOneSample.html)
Parameters
----------
data : list or pandas data series
the data
levels : dictionary, optional
the categories and numeric value to use
mu : float, optional
hypothesized median. Default is the midrange of the data
Returns
-------
pandas.DataFrame
A dataframe with the following columns:
* *mu*, the hypothesized median
* *n-pos*, the number scores above mu
* *n-neg*, the number scores below mu
* *n-tied*, the number of scores tied with mu
* *p-value*, significance (p-value)
* *test*, description of the test used
Notes
-----
The test uses the trinomial probability mass function and can be found in Bian et al. (2009, p. 6).
The formula used is:
$$p = 2\\times \\sum_{i=n_d}^n \\sum_{j=0}^{\\lfloor \\frac{n - i}{2} \\rfloor} \\text{tri}\\left(\\left(j, j+i, n - i\\right), \\left(p_{pos}, p_{neg}, p_0\\right) \\right)$$
With:
$$p_0 = \\frac{n_0}{n}$$
$$p_{pos} = p_{neg} = \\frac{1 - p_0}{n}$$
$$\\left|n_{pos} - n_{neg}\\right|$$
*Symbols used:*
* $n_0$, the number of scores equal to the hypothesized median
* $n_{pos}$, the number of scores above the hypothesized median
* $n_{neg}$, the number of scores below the hypothesized median
* $p_0$, the probability of the a score in the sample being equal to the hypothesized median
* $p_{pos}$, the population proportion of a score being above the hypothesized median
* $p_{neg}$, the population proportion of a score being below the hypothesized median
* $\\text{tri}\\left(…,… \\right)$, the trinomial probability mass function
The paired version of the test is described in Bian et al. (1941), while Zaiontz (n.d.) mentions it can also be used for one-sample situations.
Before, After and Alternatives
------------------------------
Before this test you might want an impression using a frequency table or a visualisation:
* [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
* [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
* [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart
After this you might want to determine an effect size measure:
* [es_common_language_os](../effect_sizes/eff_size_common_language_os.html#es_common_language_os) for the Common Language Effect Size
* [es_dominance](../effect_sizes/eff_size_dominance.html#es_dominance) for the Dominance score
* [r_rank_biserial_os](../correlations/cor_rank_biserial_os.html#r_rank_biserial_os) for the Rank-Biserial Correlation
Alternative tests:
* [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
* [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
The function makes use of:
* [di_mcdf](../distributions/dist_multinomial.html#di_mcdf) for the Multinomial Cumulative Distribution Function
References
----------
Bian, G., McAleer, M., & Wong, W.-K. (2009). A trinomial test for paired data when there are many ties. *SSRN Electronic Journal*. doi:10.2139/ssrn.1410589
Zaiontz, C. (n.d.). Trinomial test. Real Statistics Using Excel. Retrieved March 2, 2023, from https://real-statistics.com/non-parametric-tests/trinomial-test/
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
---------
>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = student_df['Teach_Motivate']
>>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
>>> ts_trinomial_os(ex1, levels=order)
mu n-pos. n-neg. n-tied. p-value test
0 3.0 13 29 12 0.016261 one-sample trinomial
Example 2: Numeric data
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> ts_trinomial_os(ex2)
mu n-pos. n-neg. n-tied. p-value test
0 3.0 10 6 2 0.385768 one-sample trinomial
'''
if type(data) is list:
data = pd.Series(data)
#remove missing values
data = data.dropna()
if levels is not None:
data = data.map(levels).astype('Int8')
else:
data = pd.to_numeric(data)
#set hypothesized median to mid range if not provided
if (mu is None):
mu = (min(data) + max(data)) / 2
nPos = sum(data>mu)
nNeg = sum(data<mu)
nNul = sum(data==mu)
n = nPos + nNeg + nNul
nd = abs(nPos-nNeg)
pNul = nNul/n
pPos = (1 - pNul)/2
pNeg = pPos
sig = 0
for d in range(nd, n+1):
for k in range(0, math.floor((n - d)/2)+1):
pmf = di_mpmf([k, k + d, n-k-(k+d)], [pPos, pNeg, pNul])
sig = sig + pmf
sig = sig*2
if sig>1:
sig = 1
testResults = pd.DataFrame([[mu, nPos, nNeg, nNul, sig, "one-sample trinomial"]], columns=["mu", "n-pos.", "n-neg.", "n-tied.", "p-value", "test"])
pd.set_option('display.max_colwidth', None)
return (testResults)
Functions
def ts_trinomial_os(data, levels=None, mu=None)
-
One-Sample Trinomial Test
A test that could be used with ordinal data that includes ties
Similar as a sign-test but instead of ignoring scores that are tied with the hypothesized median they get included, hence instead of the binomial distribution, this will use the trinomial distribution.
This function is shown in this YouTube video and the test is also described at PeterStatistics.com
Parameters
data
:list
orpandas data series
- the data
levels
:dictionary
, optional- the categories and numeric value to use
mu
:float
, optional- hypothesized median. Default is the midrange of the data
Returns
pandas.DataFrame
-
A dataframe with the following columns:
- mu, the hypothesized median
- n-pos, the number scores above mu
- n-neg, the number scores below mu
- n-tied, the number of scores tied with mu
- p-value, significance (p-value)
- test, description of the test used
Notes
The test uses the trinomial probability mass function and can be found in Bian et al. (2009, p. 6).
The formula used is: p = 2\times \sum_{i=n_d}^n \sum_{j=0}^{\lfloor \frac{n - i}{2} \rfloor} \text{tri}\left(\left(j, j+i, n - i\right), \left(p_{pos}, p_{neg}, p_0\right) \right)
With: p_0 = \frac{n_0}{n} p_{pos} = p_{neg} = \frac{1 - p_0}{n} \left|n_{pos} - n_{neg}\right|
Symbols used:
- $n_0$, the number of scores equal to the hypothesized median
- $n_{pos}$, the number of scores above the hypothesized median
- $n_{neg}$, the number of scores below the hypothesized median
- $p_0$, the probability of the a score in the sample being equal to the hypothesized median
- $p_{pos}$, the population proportion of a score being above the hypothesized median
- $p_{neg}$, the population proportion of a score being below the hypothesized median
- $\text{tri}\left(…,… \right)$, the trinomial probability mass function
The paired version of the test is described in Bian et al. (1941), while Zaiontz (n.d.) mentions it can also be used for one-sample situations.
Before, After and Alternatives
Before this test you might want an impression using a frequency table or a visualisation: * tab_frequency for a frequency table * vi_bar_stacked_single for Single Stacked Bar-Chart * vi_bar_dual_axis for Dual-Axis Bar Chart
After this you might want to determine an effect size measure: * es_common_language_os for the Common Language Effect Size * es_dominance for the Dominance score * r_rank_biserial_os for the Rank-Biserial Correlation
Alternative tests: * ts_sign_os for One-Sample Sign Test * ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)
The function makes use of: * di_mcdf for the Multinomial Cumulative Distribution Function
References
Bian, G., McAleer, M., & Wong, W.-K. (2009). A trinomial test for paired data when there are many ties. SSRN Electronic Journal. doi:10.2139/ssrn.1410589
Zaiontz, C. (n.d.). Trinomial test. Real Statistics Using Excel. Retrieved March 2, 2023, from https://real-statistics.com/non-parametric-tests/trinomial-test/
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000)
Example 1: pandas series
>>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = student_df['Teach_Motivate'] >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5} >>> ts_trinomial_os(ex1, levels=order) mu n-pos. n-neg. n-tied. p-value test 0 3.0 13 29 12 0.016261 one-sample trinomial
Example 2: Numeric data
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> ts_trinomial_os(ex2) mu n-pos. n-neg. n-tied. p-value test 0 3.0 10 6 2 0.385768 one-sample trinomial
Expand source code
def ts_trinomial_os(data, levels=None, mu = None): ''' One-Sample Trinomial Test ------------------------- A test that could be used with ordinal data that includes ties Similar as a sign-test but instead of ignoring scores that are tied with the hypothesized median they get included, hence instead of the binomial distribution, this will use the trinomial distribution. This function is shown in this [YouTube video](https://youtu.be/6kTZkgkEqHw) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/WilcoxonSignedRankOneSample.html) Parameters ---------- data : list or pandas data series the data levels : dictionary, optional the categories and numeric value to use mu : float, optional hypothesized median. Default is the midrange of the data Returns ------- pandas.DataFrame A dataframe with the following columns: * *mu*, the hypothesized median * *n-pos*, the number scores above mu * *n-neg*, the number scores below mu * *n-tied*, the number of scores tied with mu * *p-value*, significance (p-value) * *test*, description of the test used Notes ----- The test uses the trinomial probability mass function and can be found in Bian et al. (2009, p. 6). The formula used is: $$p = 2\\times \\sum_{i=n_d}^n \\sum_{j=0}^{\\lfloor \\frac{n - i}{2} \\rfloor} \\text{tri}\\left(\\left(j, j+i, n - i\\right), \\left(p_{pos}, p_{neg}, p_0\\right) \\right)$$ With: $$p_0 = \\frac{n_0}{n}$$ $$p_{pos} = p_{neg} = \\frac{1 - p_0}{n}$$ $$\\left|n_{pos} - n_{neg}\\right|$$ *Symbols used:* * $n_0$, the number of scores equal to the hypothesized median * $n_{pos}$, the number of scores above the hypothesized median * $n_{neg}$, the number of scores below the hypothesized median * $p_0$, the probability of the a score in the sample being equal to the hypothesized median * $p_{pos}$, the population proportion of a score being above the hypothesized median * $p_{neg}$, the population proportion of a score being below the hypothesized median * $\\text{tri}\\left(…,… \\right)$, the trinomial probability mass function The paired version of the test is described in Bian et al. (1941), while Zaiontz (n.d.) mentions it can also be used for one-sample situations. Before, After and Alternatives ------------------------------ Before this test you might want an impression using a frequency table or a visualisation: * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart After this you might want to determine an effect size measure: * [es_common_language_os](../effect_sizes/eff_size_common_language_os.html#es_common_language_os) for the Common Language Effect Size * [es_dominance](../effect_sizes/eff_size_dominance.html#es_dominance) for the Dominance score * [r_rank_biserial_os](../correlations/cor_rank_biserial_os.html#r_rank_biserial_os) for the Rank-Biserial Correlation Alternative tests: * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample) The function makes use of: * [di_mcdf](../distributions/dist_multinomial.html#di_mcdf) for the Multinomial Cumulative Distribution Function References ---------- Bian, G., McAleer, M., & Wong, W.-K. (2009). A trinomial test for paired data when there are many ties. *SSRN Electronic Journal*. doi:10.2139/ssrn.1410589 Zaiontz, C. (n.d.). Trinomial test. Real Statistics Using Excel. Retrieved March 2, 2023, from https://real-statistics.com/non-parametric-tests/trinomial-test/ Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples --------- >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) Example 1: pandas series >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = student_df['Teach_Motivate'] >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5} >>> ts_trinomial_os(ex1, levels=order) mu n-pos. n-neg. n-tied. p-value test 0 3.0 13 29 12 0.016261 one-sample trinomial Example 2: Numeric data >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> ts_trinomial_os(ex2) mu n-pos. n-neg. n-tied. p-value test 0 3.0 10 6 2 0.385768 one-sample trinomial ''' if type(data) is list: data = pd.Series(data) #remove missing values data = data.dropna() if levels is not None: data = data.map(levels).astype('Int8') else: data = pd.to_numeric(data) #set hypothesized median to mid range if not provided if (mu is None): mu = (min(data) + max(data)) / 2 nPos = sum(data>mu) nNeg = sum(data<mu) nNul = sum(data==mu) n = nPos + nNeg + nNul nd = abs(nPos-nNeg) pNul = nNul/n pPos = (1 - pNul)/2 pNeg = pPos sig = 0 for d in range(nd, n+1): for k in range(0, math.floor((n - d)/2)+1): pmf = di_mpmf([k, k + d, n-k-(k+d)], [pPos, pNeg, pNul]) sig = sig + pmf sig = sig*2 if sig>1: sig = 1 testResults = pd.DataFrame([[mu, nPos, nNeg, nNul, sig, "one-sample trinomial"]], columns=["mu", "n-pos.", "n-neg.", "n-tied.", "p-value", "test"]) pd.set_option('display.max_colwidth', None) return (testResults)