Module stikpetP.tests.test_binomial_os

Expand source code
import pandas as pd
from scipy.stats import binom

# This function is used in ph_binomial()

def ts_binomial_os(data, p0=0.5, p0Cat=None, codes=None, twoSidedMethod="eqdist"):
    '''
    One-sample Binomial Test
    ------------------------
     
    Performs a one-sample (exact) binomial test. 
    
    This test can be useful with a single binary variable as input. The null hypothesis is usually that the proportions of the two categories in the population are equal (i.e. 0.5 for each). If the p-value of the test is below the pre-defined alpha level (usually 5% = 0.05) the null hypothesis is rejected and the two categories differ in proportion significantly.
    
    The input for the function doesn't have to be a binary variable. A nominal variable can also be used and the two categories to compare indicated. 
    
    A significance in general is the probability of a result as in the sample, or more extreme, if the null hypothesis is true. For a two-tailed binomial test the 'or more extreme' causes a bit of a complication. There are different methods to approach this problem. See the details for more information.
    
    This function is shown in this [YouTube video](https://youtu.be/CzysWqVZzT0) and the binomial test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/binomial-one-sample.html)
    
    Parameters
    ----------
    data : list or pandas data series
        the data
    p0 : float, optional 
        hypothesized proportion for the first category (default is 0.5)
    p0Cat : optional
        the category for which p0 was used
    codes : list, optional 
        the two codes to use
    twoSidedMethod : {"eqdist", "double", "smallp"}, optional 
        method to be used for 2-sided significance. Default is "eqdist".
        
    Returns
    -------
    pandas.DataFrame
        A dataframe with the following columns:
    
        - *p-value (2-sided)* : the two-sided significance (p-value) 
        - *test* : description of the test used
    
    Notes
    -------
    To decide on which category is associated with p0 the following is used:
    * If codes are provided, the first code is assumed to be the category for the p0.
    * If p0Cat is specified that will be used for p0 and all other categories will be considered as category 2, this means if there are more than two categories the remaining two or more (besides p0Cat) will be merged as one large category.
    * If neither codes or p0Cat is specified and more than two categories are in the data a warning is printed and no results.
    * If neither codes or p0Cat is specified and there are two categories, p0 is assumed to be for the category closest matching the p0 value (i.e. if p0 is above 0.5 the category with the highest count is assumed to be used for p0)
    
    It uses scipy.stats' binom. For the formulas below it is assumed that the observed proportion is less than the expected proportion, if this isn't the case, the right-tail probabilities are used.
    
    A one sided p-value is calculated first:
    $$sig_{one-tail} = B\\left(n, n_{min}, p_0^{\\ast}\\right)$$
    With:
    $$n_{min} = \\min \\left(n_s, n_f\\right)$$
    $$p_0^{\\ast} = \\begin{cases}p_0 & \\text{ if } n_{min}=n_s \\\\ 1 - p_0 & \\text{ if } n_{min}= n_f\\end{cases}$$
    Where:
    
    * \\(n\\) is the number of cases
    * \\(n_s\\) is the number of successes
    * \\(n_f\\) is the number of failures
    * \\(p_0\\) is the probability of a success according to the null hypothesis
    * \\(p_0^{\\ast}\\) is the probability adjusted in case failures is used
    * \\(B\\left(\\dots\\right)\\) the binomial cumulative distribution function
    
    For the two sided significance three options can be used.
    
    Option 1: Equal Distance Method (eqdist)
    $$sig_{two-tail} = B\\left(n, n_{min}, p_0^{\\ast}\\right) + 1 - B\\left(n, \\lfloor 2 \\times n_0 \\rfloor - n_{min} - 1, p_0^{\\ast}\\right)$$
    With:
    \\(n_0 = \\lfloor n\\times p_0\\rfloor\\)
    
    This method looks at the number of cases. In a sample of \\(n\\) people, we’d then expect \\(n_0 = \\lfloor n\\times p_0\\rfloor\\) successes (we round the result down to the nearest integer). We only had \\(n_{min}\\), so a difference of \\(n_0-n_{min}\\). The ‘equal distance method’ now means to look for the chance of having \\(k\\) or less, and \\(n_0+n_0-n_{min}=2\\times n_0-n_{min}\\) or more. Each of these two probabilities can be found using a binomial distribution. Adding these two together than gives the two-sided significance. 
    
    Option 2: Small p-method
    $$sig_{two-tail} = B\\left(n, n_{min}, p_0^{\\ast}\\right) + \\sum_{i=n_{min}+1}^n \\begin{cases} 0 & \\text{ if } b\\left(n, i, p_0^{\\ast}\\right)> b\\left(n, n_{min}, p_0^{\\ast}\\right) \\\\ b\\left(n, i, p_0^{\\ast}\\right)& \\text{ if } \\times \\leq  b\\left(n, i, p_0^{\\ast}\\right)> b\\left(n, n_{min}, p_0^{\\ast}\\right) \\end{cases}$$
    With:
    \\(b\\times \\left(\\dots\\right)\\) as the binomial probability mass function.
    
    This method looks at the probabilities itself. \\(b\\left(n, n_{min}, p_0^{\\ast}\\right)\\) is the probability of having exactly \\(n_{min}\\) out of a group of n, with a chance \\(p_0^{\\ast}\\) each time. The method of small p-values now considers ‘or more extreme’ any number between 0 and n (the sample size) that has a probability less or equal to this. This means we need to go over each option, determine the probability and check if it is lower or equal. So, the probability of 0 successes, the probability of 1 success, etc. The sum for all of those will be the two-sided significance. We can reduce the work a little since any value below \\(n_{min}\\), will also have a lower probability, so we only need to sum over the ones above it and add the one-sided significance to the sum of those.
    
    Option 3: Double single
    $$sig_{two-tail} = 2\\times sig_{one-tail}$$
    
    Fairly straight forward. Just double the one-sided significance.
    
    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a frequency table:
    [tab_frequency](../other/table_frequency.html#tab_frequency)

    After the test you might want an effect size measure:
    * [es_cohen_g](../effect_sizes/eff_size_cohen_g.html#es_cohen_g) for Cohen g
    * [es_cohen_h_os](../effect_sizes/eff_size_cohen_h_os.html#es_cohen_h_os) for Cohen h'
    * [es_alt_ratio](../effect_sizes/eff_size_alt_ratio.html#es_alt_ratio) for Alternative Ratio

    Alternatives for this test could be:
    * [ts_score_os](../tests/test_score_os.html#ts_score_os) for One-Sample Score Test
    * [ts_wald_os](../tests/test_wald_os.html#ts_wald_os) for One-Sample Wald Test
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    ---------
    >>> pd.set_option('display.width',1000)
    >>> pd.set_option('display.max_columns', 1000)
    
    Example 1: Numeric list
    >>> ex1 = [1, 1, 2, 1, 2, 1, 2, 1]
    >>> ts_binomial_os(ex1)
       p-value (2-sided)                                                                 test
    0           0.726562  one-sample binomial, with equal-distance method (assuming p0 for 1)
    
    Setting a different hypothesized proportion, and going over different methods to determine two-sided test:
    >>> ts_binomial_os(ex1, p0=0.3)
       p-value (2-sided)                                                                 test
    0           0.313266  one-sample binomial, with equal-distance method (assuming p0 for 1)
    
    Example 2: pandas Series
    >>> gss_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ts_binomial_os(gss_df['sex'])
       p-value (2-sided)                                                                      test
    0           0.000006  one-sample binomial, with equal-distance method (assuming p0 for FEMALE)
    >>> ts_binomial_os(gss_df['mar1'], codes=["DIVORCED", "NEVER MARRIED"])
       p-value (2-sided)                                                                    test
    0           0.003002  one-sample binomial, with equal-distance method (with p0 for DIVORCED)
    
    '''
    if type(data) is list:
        data = pd.Series(data)

    #remove missing values
    data = data.dropna()
    
    testUsed = "one-sample binomial"
    
    #Determine number of successes, failures, and total sample size
    if codes is None:
        #create a frequency table
        freq = data.value_counts()

        if p0Cat is None:
            #check if there were exactly two categories or not
            if len(freq) != 2:
                # unable to determine which category p0 would belong to, so print warning and end
                print("WARNING: data does not have two unique categories, please specify two categories using codes parameter")
                return
            else:
                #simply select the two categories as cat1 and cat2
                n1 = freq.values[0]
                n2 = freq.values[1]
                n = n1 + n2
                #determine p0 was for which category
                p0_cat = freq.index[0]
                if p0 > 0.5 and n1 < n2:
                    n3=n2
                    n2 = n1
                    n1 = n3
                    p0_cat = freq.index[1]
                cat_used =  " (assuming p0 for " + str(p0_cat) + ")"
        else:
            n = sum(freq.values)
            n1 = sum(data==p0Cat)
            n2 = n - n1
            p0_cat = p0Cat
            cat_used =  " (with p0 for " + str(p0Cat) + ")"            
    else:        
        n1 = sum(data==codes[0])
        n2 = sum(data==codes[1])
        n = n1 + n2
        cat_used =  " (with p0 for " + str(codes[0]) + ")"
    
    minCount = n1
    ExpProp = p0
    ObsProp = n1/n
    if n2 < n1:
        minCount = n2
        ExpProp = 1 - p0
        ObsProp = n2/n
        
    #one sided test
    if ExpProp < ObsProp:
        sig1 = 1 - binom.cdf(minCount-1,n,ExpProp)  
    else:
        sig1 = binom.cdf(minCount,n,ExpProp)  
    
    #two sided
    if twoSidedMethod=="double":
        sig2 = sig1
        testUsed = testUsed + ", with double one-sided method"
    
    elif twoSidedMethod=="eqdist":
        #Equal distance
        ExpCount = int(n * ExpProp)
        if minCount > ExpCount and n*ExpProp - ExpCount != 0:
            ExpCount = ExpCount + 1
        Dist = ExpCount - minCount
        if Dist==0:
            sig2 = 1 - binom.cdf(minCount,n,ExpProp)        
        else:
            OtherCount = ExpCount - Dist
            if Dist < 0:
                sig2 = 0
            elif ExpProp < ObsProp:
                sig2 = binom.cdf(OtherCount,n,ExpProp)
            else:
                OtherCount = ExpCount + Dist
                if OtherCount > n:
                    sig2 = 0
                else:
                    sig2 = 1 - binom.cdf(OtherCount - 1,n,ExpProp)
        testUsed = testUsed + ", with equal-distance method"
        
    else:
        #Method of small p
        #find the first value in the other direction that has a pmf less or equal to the observed one
        binSmall = binom.pmf(minCount, n, ExpProp)
        binDist = binSmall + 1
        if ExpProp < ObsProp:
            startAt = 0
            endAt = minCount - 1            
            i = endAt
            while binDist >= binSmall and i>=0:
                binDist = binom.pmf(i, n, ExpProp)
                i = i - 1
            sig2 = binom.cdf(i+1, n, ExpProp)
        else:
            startAt = minCount + 1
            endAt = n
            i = startAt
            while binDist >= binSmall and i<=n:
                binDist = binom.pmf(i, n, ExpProp)
                i = i +1
            sig2 = 1 - binom.cdf(i-1-1, n, ExpProp)

        testUsed = testUsed + ", with small p method"

    testUsed = testUsed + cat_used
    
    sigT = sig1 + sig2
    
    testResults = pd.DataFrame([[sigT, testUsed]], columns=["p-value (2-sided)", "test"])
    
    return testResults

Functions

def ts_binomial_os(data, p0=0.5, p0Cat=None, codes=None, twoSidedMethod='eqdist')

One-sample Binomial Test

Performs a one-sample (exact) binomial test.

This test can be useful with a single binary variable as input. The null hypothesis is usually that the proportions of the two categories in the population are equal (i.e. 0.5 for each). If the p-value of the test is below the pre-defined alpha level (usually 5% = 0.05) the null hypothesis is rejected and the two categories differ in proportion significantly.

The input for the function doesn't have to be a binary variable. A nominal variable can also be used and the two categories to compare indicated.

A significance in general is the probability of a result as in the sample, or more extreme, if the null hypothesis is true. For a two-tailed binomial test the 'or more extreme' causes a bit of a complication. There are different methods to approach this problem. See the details for more information.

This function is shown in this YouTube video and the binomial test is also described at PeterStatistics.com

Parameters

data : list or pandas data series
the data
p0 : float, optional
hypothesized proportion for the first category (default is 0.5)
p0Cat : optional
the category for which p0 was used
codes : list, optional
the two codes to use
twoSidedMethod : {"eqdist", "double", "smallp"}, optional
method to be used for 2-sided significance. Default is "eqdist".

Returns

pandas.DataFrame

A dataframe with the following columns:

  • p-value (2-sided) : the two-sided significance (p-value)
  • test : description of the test used

Notes

To decide on which category is associated with p0 the following is used: * If codes are provided, the first code is assumed to be the category for the p0. * If p0Cat is specified that will be used for p0 and all other categories will be considered as category 2, this means if there are more than two categories the remaining two or more (besides p0Cat) will be merged as one large category. * If neither codes or p0Cat is specified and more than two categories are in the data a warning is printed and no results. * If neither codes or p0Cat is specified and there are two categories, p0 is assumed to be for the category closest matching the p0 value (i.e. if p0 is above 0.5 the category with the highest count is assumed to be used for p0)

It uses scipy.stats' binom. For the formulas below it is assumed that the observed proportion is less than the expected proportion, if this isn't the case, the right-tail probabilities are used.

A one sided p-value is calculated first: sig_{one-tail} = B\left(n, n_{min}, p_0^{\ast}\right) With: n_{min} = \min \left(n_s, n_f\right) p_0^{\ast} = \begin{cases}p_0 & \text{ if } n_{min}=n_s \\ 1 - p_0 & \text{ if } n_{min}= n_f\end{cases} Where:

  • n is the number of cases
  • n_s is the number of successes
  • n_f is the number of failures
  • p_0 is the probability of a success according to the null hypothesis
  • p_0^{\ast} is the probability adjusted in case failures is used
  • B\left(\dots\right) the binomial cumulative distribution function

For the two sided significance three options can be used.

Option 1: Equal Distance Method (eqdist) sig_{two-tail} = B\left(n, n_{min}, p_0^{\ast}\right) + 1 - B\left(n, \lfloor 2 \times n_0 \rfloor - n_{min} - 1, p_0^{\ast}\right) With: n_0 = \lfloor n\times p_0\rfloor

This method looks at the number of cases. In a sample of n people, we’d then expect n_0 = \lfloor n\times p_0\rfloor successes (we round the result down to the nearest integer). We only had n_{min}, so a difference of n_0-n_{min}. The ‘equal distance method’ now means to look for the chance of having k or less, and n_0+n_0-n_{min}=2\times n_0-n_{min} or more. Each of these two probabilities can be found using a binomial distribution. Adding these two together than gives the two-sided significance.

Option 2: Small p-method sig_{two-tail} = B\left(n, n_{min}, p_0^{\ast}\right) + \sum_{i=n_{min}+1}^n \begin{cases} 0 & \text{ if } b\left(n, i, p_0^{\ast}\right)> b\left(n, n_{min}, p_0^{\ast}\right) \\ b\left(n, i, p_0^{\ast}\right)& \text{ if } \times \leq b\left(n, i, p_0^{\ast}\right)> b\left(n, n_{min}, p_0^{\ast}\right) \end{cases} With: b\times \left(\dots\right) as the binomial probability mass function.

This method looks at the probabilities itself. b\left(n, n_{min}, p_0^{\ast}\right) is the probability of having exactly n_{min} out of a group of n, with a chance p_0^{\ast} each time. The method of small p-values now considers ‘or more extreme’ any number between 0 and n (the sample size) that has a probability less or equal to this. This means we need to go over each option, determine the probability and check if it is lower or equal. So, the probability of 0 successes, the probability of 1 success, etc. The sum for all of those will be the two-sided significance. We can reduce the work a little since any value below n_{min}, will also have a lower probability, so we only need to sum over the ones above it and add the one-sided significance to the sum of those.

Option 3: Double single sig_{two-tail} = 2\times sig_{one-tail}

Fairly straight forward. Just double the one-sided significance.

Before, After and Alternatives

Before running the test you might first want to get an impression using a frequency table: tab_frequency

After the test you might want an effect size measure: * es_cohen_g for Cohen g * es_cohen_h_os for Cohen h' * es_alt_ratio for Alternative Ratio

Alternatives for this test could be: * ts_score_os for One-Sample Score Test * ts_wald_os for One-Sample Wald Test

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)

Example 1: Numeric list

>>> ex1 = [1, 1, 2, 1, 2, 1, 2, 1]
>>> ts_binomial_os(ex1)
   p-value (2-sided)                                                                 test
0           0.726562  one-sample binomial, with equal-distance method (assuming p0 for 1)

Setting a different hypothesized proportion, and going over different methods to determine two-sided test:

>>> ts_binomial_os(ex1, p0=0.3)
   p-value (2-sided)                                                                 test
0           0.313266  one-sample binomial, with equal-distance method (assuming p0 for 1)

Example 2: pandas Series

>>> gss_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ts_binomial_os(gss_df['sex'])
   p-value (2-sided)                                                                      test
0           0.000006  one-sample binomial, with equal-distance method (assuming p0 for FEMALE)
>>> ts_binomial_os(gss_df['mar1'], codes=["DIVORCED", "NEVER MARRIED"])
   p-value (2-sided)                                                                    test
0           0.003002  one-sample binomial, with equal-distance method (with p0 for DIVORCED)
Expand source code
def ts_binomial_os(data, p0=0.5, p0Cat=None, codes=None, twoSidedMethod="eqdist"):
    '''
    One-sample Binomial Test
    ------------------------
     
    Performs a one-sample (exact) binomial test. 
    
    This test can be useful with a single binary variable as input. The null hypothesis is usually that the proportions of the two categories in the population are equal (i.e. 0.5 for each). If the p-value of the test is below the pre-defined alpha level (usually 5% = 0.05) the null hypothesis is rejected and the two categories differ in proportion significantly.
    
    The input for the function doesn't have to be a binary variable. A nominal variable can also be used and the two categories to compare indicated. 
    
    A significance in general is the probability of a result as in the sample, or more extreme, if the null hypothesis is true. For a two-tailed binomial test the 'or more extreme' causes a bit of a complication. There are different methods to approach this problem. See the details for more information.
    
    This function is shown in this [YouTube video](https://youtu.be/CzysWqVZzT0) and the binomial test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/binomial-one-sample.html)
    
    Parameters
    ----------
    data : list or pandas data series
        the data
    p0 : float, optional 
        hypothesized proportion for the first category (default is 0.5)
    p0Cat : optional
        the category for which p0 was used
    codes : list, optional 
        the two codes to use
    twoSidedMethod : {"eqdist", "double", "smallp"}, optional 
        method to be used for 2-sided significance. Default is "eqdist".
        
    Returns
    -------
    pandas.DataFrame
        A dataframe with the following columns:
    
        - *p-value (2-sided)* : the two-sided significance (p-value) 
        - *test* : description of the test used
    
    Notes
    -------
    To decide on which category is associated with p0 the following is used:
    * If codes are provided, the first code is assumed to be the category for the p0.
    * If p0Cat is specified that will be used for p0 and all other categories will be considered as category 2, this means if there are more than two categories the remaining two or more (besides p0Cat) will be merged as one large category.
    * If neither codes or p0Cat is specified and more than two categories are in the data a warning is printed and no results.
    * If neither codes or p0Cat is specified and there are two categories, p0 is assumed to be for the category closest matching the p0 value (i.e. if p0 is above 0.5 the category with the highest count is assumed to be used for p0)
    
    It uses scipy.stats' binom. For the formulas below it is assumed that the observed proportion is less than the expected proportion, if this isn't the case, the right-tail probabilities are used.
    
    A one sided p-value is calculated first:
    $$sig_{one-tail} = B\\left(n, n_{min}, p_0^{\\ast}\\right)$$
    With:
    $$n_{min} = \\min \\left(n_s, n_f\\right)$$
    $$p_0^{\\ast} = \\begin{cases}p_0 & \\text{ if } n_{min}=n_s \\\\ 1 - p_0 & \\text{ if } n_{min}= n_f\\end{cases}$$
    Where:
    
    * \\(n\\) is the number of cases
    * \\(n_s\\) is the number of successes
    * \\(n_f\\) is the number of failures
    * \\(p_0\\) is the probability of a success according to the null hypothesis
    * \\(p_0^{\\ast}\\) is the probability adjusted in case failures is used
    * \\(B\\left(\\dots\\right)\\) the binomial cumulative distribution function
    
    For the two sided significance three options can be used.
    
    Option 1: Equal Distance Method (eqdist)
    $$sig_{two-tail} = B\\left(n, n_{min}, p_0^{\\ast}\\right) + 1 - B\\left(n, \\lfloor 2 \\times n_0 \\rfloor - n_{min} - 1, p_0^{\\ast}\\right)$$
    With:
    \\(n_0 = \\lfloor n\\times p_0\\rfloor\\)
    
    This method looks at the number of cases. In a sample of \\(n\\) people, we’d then expect \\(n_0 = \\lfloor n\\times p_0\\rfloor\\) successes (we round the result down to the nearest integer). We only had \\(n_{min}\\), so a difference of \\(n_0-n_{min}\\). The ‘equal distance method’ now means to look for the chance of having \\(k\\) or less, and \\(n_0+n_0-n_{min}=2\\times n_0-n_{min}\\) or more. Each of these two probabilities can be found using a binomial distribution. Adding these two together than gives the two-sided significance. 
    
    Option 2: Small p-method
    $$sig_{two-tail} = B\\left(n, n_{min}, p_0^{\\ast}\\right) + \\sum_{i=n_{min}+1}^n \\begin{cases} 0 & \\text{ if } b\\left(n, i, p_0^{\\ast}\\right)> b\\left(n, n_{min}, p_0^{\\ast}\\right) \\\\ b\\left(n, i, p_0^{\\ast}\\right)& \\text{ if } \\times \\leq  b\\left(n, i, p_0^{\\ast}\\right)> b\\left(n, n_{min}, p_0^{\\ast}\\right) \\end{cases}$$
    With:
    \\(b\\times \\left(\\dots\\right)\\) as the binomial probability mass function.
    
    This method looks at the probabilities itself. \\(b\\left(n, n_{min}, p_0^{\\ast}\\right)\\) is the probability of having exactly \\(n_{min}\\) out of a group of n, with a chance \\(p_0^{\\ast}\\) each time. The method of small p-values now considers ‘or more extreme’ any number between 0 and n (the sample size) that has a probability less or equal to this. This means we need to go over each option, determine the probability and check if it is lower or equal. So, the probability of 0 successes, the probability of 1 success, etc. The sum for all of those will be the two-sided significance. We can reduce the work a little since any value below \\(n_{min}\\), will also have a lower probability, so we only need to sum over the ones above it and add the one-sided significance to the sum of those.
    
    Option 3: Double single
    $$sig_{two-tail} = 2\\times sig_{one-tail}$$
    
    Fairly straight forward. Just double the one-sided significance.
    
    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a frequency table:
    [tab_frequency](../other/table_frequency.html#tab_frequency)

    After the test you might want an effect size measure:
    * [es_cohen_g](../effect_sizes/eff_size_cohen_g.html#es_cohen_g) for Cohen g
    * [es_cohen_h_os](../effect_sizes/eff_size_cohen_h_os.html#es_cohen_h_os) for Cohen h'
    * [es_alt_ratio](../effect_sizes/eff_size_alt_ratio.html#es_alt_ratio) for Alternative Ratio

    Alternatives for this test could be:
    * [ts_score_os](../tests/test_score_os.html#ts_score_os) for One-Sample Score Test
    * [ts_wald_os](../tests/test_wald_os.html#ts_wald_os) for One-Sample Wald Test
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    ---------
    >>> pd.set_option('display.width',1000)
    >>> pd.set_option('display.max_columns', 1000)
    
    Example 1: Numeric list
    >>> ex1 = [1, 1, 2, 1, 2, 1, 2, 1]
    >>> ts_binomial_os(ex1)
       p-value (2-sided)                                                                 test
    0           0.726562  one-sample binomial, with equal-distance method (assuming p0 for 1)
    
    Setting a different hypothesized proportion, and going over different methods to determine two-sided test:
    >>> ts_binomial_os(ex1, p0=0.3)
       p-value (2-sided)                                                                 test
    0           0.313266  one-sample binomial, with equal-distance method (assuming p0 for 1)
    
    Example 2: pandas Series
    >>> gss_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ts_binomial_os(gss_df['sex'])
       p-value (2-sided)                                                                      test
    0           0.000006  one-sample binomial, with equal-distance method (assuming p0 for FEMALE)
    >>> ts_binomial_os(gss_df['mar1'], codes=["DIVORCED", "NEVER MARRIED"])
       p-value (2-sided)                                                                    test
    0           0.003002  one-sample binomial, with equal-distance method (with p0 for DIVORCED)
    
    '''
    if type(data) is list:
        data = pd.Series(data)

    #remove missing values
    data = data.dropna()
    
    testUsed = "one-sample binomial"
    
    #Determine number of successes, failures, and total sample size
    if codes is None:
        #create a frequency table
        freq = data.value_counts()

        if p0Cat is None:
            #check if there were exactly two categories or not
            if len(freq) != 2:
                # unable to determine which category p0 would belong to, so print warning and end
                print("WARNING: data does not have two unique categories, please specify two categories using codes parameter")
                return
            else:
                #simply select the two categories as cat1 and cat2
                n1 = freq.values[0]
                n2 = freq.values[1]
                n = n1 + n2
                #determine p0 was for which category
                p0_cat = freq.index[0]
                if p0 > 0.5 and n1 < n2:
                    n3=n2
                    n2 = n1
                    n1 = n3
                    p0_cat = freq.index[1]
                cat_used =  " (assuming p0 for " + str(p0_cat) + ")"
        else:
            n = sum(freq.values)
            n1 = sum(data==p0Cat)
            n2 = n - n1
            p0_cat = p0Cat
            cat_used =  " (with p0 for " + str(p0Cat) + ")"            
    else:        
        n1 = sum(data==codes[0])
        n2 = sum(data==codes[1])
        n = n1 + n2
        cat_used =  " (with p0 for " + str(codes[0]) + ")"
    
    minCount = n1
    ExpProp = p0
    ObsProp = n1/n
    if n2 < n1:
        minCount = n2
        ExpProp = 1 - p0
        ObsProp = n2/n
        
    #one sided test
    if ExpProp < ObsProp:
        sig1 = 1 - binom.cdf(minCount-1,n,ExpProp)  
    else:
        sig1 = binom.cdf(minCount,n,ExpProp)  
    
    #two sided
    if twoSidedMethod=="double":
        sig2 = sig1
        testUsed = testUsed + ", with double one-sided method"
    
    elif twoSidedMethod=="eqdist":
        #Equal distance
        ExpCount = int(n * ExpProp)
        if minCount > ExpCount and n*ExpProp - ExpCount != 0:
            ExpCount = ExpCount + 1
        Dist = ExpCount - minCount
        if Dist==0:
            sig2 = 1 - binom.cdf(minCount,n,ExpProp)        
        else:
            OtherCount = ExpCount - Dist
            if Dist < 0:
                sig2 = 0
            elif ExpProp < ObsProp:
                sig2 = binom.cdf(OtherCount,n,ExpProp)
            else:
                OtherCount = ExpCount + Dist
                if OtherCount > n:
                    sig2 = 0
                else:
                    sig2 = 1 - binom.cdf(OtherCount - 1,n,ExpProp)
        testUsed = testUsed + ", with equal-distance method"
        
    else:
        #Method of small p
        #find the first value in the other direction that has a pmf less or equal to the observed one
        binSmall = binom.pmf(minCount, n, ExpProp)
        binDist = binSmall + 1
        if ExpProp < ObsProp:
            startAt = 0
            endAt = minCount - 1            
            i = endAt
            while binDist >= binSmall and i>=0:
                binDist = binom.pmf(i, n, ExpProp)
                i = i - 1
            sig2 = binom.cdf(i+1, n, ExpProp)
        else:
            startAt = minCount + 1
            endAt = n
            i = startAt
            while binDist >= binSmall and i<=n:
                binDist = binom.pmf(i, n, ExpProp)
                i = i +1
            sig2 = 1 - binom.cdf(i-1-1, n, ExpProp)

        testUsed = testUsed + ", with small p method"

    testUsed = testUsed + cat_used
    
    sigT = sig1 + sig2
    
    testResults = pd.DataFrame([[sigT, testUsed]], columns=["p-value (2-sided)", "test"])
    
    return testResults