Module stikpetP.other.poho_pairwise_bin
Expand source code
import pandas as pd
from ..tests.test_binomial_os import ts_binomial_os
from ..tests.test_wald_os import ts_wald_os
from ..tests.test_score_os import ts_score_os
from ..other.p_adjustments import p_adjust
def ph_pairwise_bin(data, test="binomial", expCount=None, mtc='bonferroni', **kwargs):
'''
Pairwise Binary Test for Post-Hoc Analysis
--------------------------------------------
This function will perform a one-sample binary test for each possible pair in the data. This could either be a binomial, Wald or score test.
The unadjusted p-values and Bonferroni adjusted p-values are both determined.
This function is shown in this [YouTube video](https://youtu.be/0uY4VAbvGpQ) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/PostHocAfterGoF.html)
Parameters
----------
data : list or pandas series
test : {"binomial", "score", "wald"}, optional
test to use for each pair
expCount : pandas dataframe, optional
categories and expected counts
mtc : string, optional
any of the methods available in p_adjust() to correct for multiple tests
**kwargs : optional
additional arguments for the specific test that are passed along.
Returns
-------
pandas.DataFrame
A dataframe with the following columns:
- *category 1* : the label of the first category
- *category 2* : the label of the second category
- *n1* : the sample size of the first category
- *n2* : the sample size of the second category
- *obs. prop. 1* : the proportion in the sample of the first category
- *exp. prop. 1* : the expected proportion for the first category
- *p-value* : the unadjusted significance
- *adj. p-value* : the adjusted significance
Notes
-----
None.
Before, After and Alternatives
------------------------------
Before this an omnibus test might be helpful:
* [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
* [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit
* [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
* [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
* [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
* [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
* [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
* [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
After this you might want to add an effect size measure:
* [es_post_hoc_gof](../effect_sizes/eff_size_post_hoc_gof.html#es_post_hoc_gof) for various effect sizes
Alternative post-hoc tests:
* [ph_pairwise_gof](../other/poho_pairwise_gof.html#ph_pairwise_gof) for Pairwise Goodness-of-Fit Tests
* [ph_residual_gof_bin](../other/poho_residual_gof_bin.html#ph_residual_gof_bin) for Residuals Tests
* [ph_residual_gof_gof](../other/poho_residual_gof_gof.html#ph_residual_gof_gof) for Residuals Using Goodness-of-Fit Tests
The binary test that is performed on each pair:
* [ts_binomial_os](../tests/test_binomial_os.html#ts_binomial_os) for One-Sample Binomial Test
* [ts_score_os](../tests/test_score_os.html#ts_score_os) for One-Sample Score Test
* [ts_wald_os](../tests/test_wald_os.html#ts_wald_os) for One-Sample Wald Test
More info on the adjustment for multiple testing:
* [p_adjust](../other/p_adjustments.html#p_adjust)
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Examples: get data
>>> import pandas as pd
>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)
>>> gss_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = gss_df['mar1'];
Example 1 using default settings:
>>> ph_pairwise_bin(ex1)
category 1 category 2 n1 n2 n pair obs. prop. 1 exp. prop. 1 statistic p-value adj. p-value test
0 MARRIED NEVER MARRIED 972.0 395.0 1367.0 0.711046 0.5 n.a. 1.052263e-56 1.052263e-55 one-sample binomial, with equal-distance method (with p0 for MARRIED)
1 MARRIED DIVORCED 972.0 314.0 1286.0 0.755832 0.5 n.a. 7.829174e-79 7.829174e-78 one-sample binomial, with equal-distance method (with p0 for MARRIED)
2 MARRIED WIDOWED 972.0 181.0 1153.0 0.843018 0.5 n.a. 1.407217e-131 1.407217e-130 one-sample binomial, with equal-distance method (with p0 for MARRIED)
3 MARRIED SEPARATED 972.0 79.0 1051.0 0.924833 0.5 n.a. 1.267980e-196 1.267980e-195 one-sample binomial, with equal-distance method (with p0 for MARRIED)
4 NEVER MARRIED DIVORCED 395.0 314.0 709.0 0.557123 0.5 n.a. 3.001933e-03 3.001933e-02 one-sample binomial, with equal-distance method (with p0 for NEVER MARRIED)
5 NEVER MARRIED WIDOWED 395.0 181.0 576.0 0.685764 0.5 n.a. 1.352112e-19 1.352112e-18 one-sample binomial, with equal-distance method (with p0 for NEVER MARRIED)
6 NEVER MARRIED SEPARATED 395.0 79.0 474.0 0.833333 0.5 n.a. 7.075688e-52 7.075688e-51 one-sample binomial, with equal-distance method (with p0 for NEVER MARRIED)
7 DIVORCED WIDOWED 314.0 181.0 495.0 0.634343 0.5 n.a. 3.295753e-09 3.295753e-08 one-sample binomial, with equal-distance method (with p0 for DIVORCED)
8 DIVORCED SEPARATED 314.0 79.0 393.0 0.798982 0.5 n.a. 1.472395e-34 1.472395e-33 one-sample binomial, with equal-distance method (with p0 for DIVORCED)
9 WIDOWED SEPARATED 181.0 79.0 260.0 0.696154 0.5 n.a. 2.223544e-10 2.223544e-09 one-sample binomial, with equal-distance method (with p0 for WIDOWED)
Example 2 using a score test with Yates correction:
>>> ph_pairwise_bin(ex1, test="score", mtc='holm', cc='yates')
category 1 category 2 n1 n2 n pair obs. prop. 1 exp. prop. 1 statistic p-value adj. p-value test
0 MARRIED NEVER MARRIED 972.0 395.0 1367.0 0.711046 0.5 -15.578952 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED)
1 MARRIED DIVORCED 972.0 314.0 1286.0 0.755832 0.5 -18.320819 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED)
2 MARRIED WIDOWED 972.0 181.0 1153.0 0.843018 0.5 -23.265503 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED)
3 MARRIED SEPARATED 972.0 79.0 1051.0 0.924833 0.5 -27.514619 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED)
4 NEVER MARRIED DIVORCED 395.0 314.0 709.0 0.557123 0.5 -3.004463 2.660501e-03 2.660501e-03 one-sample score with Yates continuity correction (with p0 for NEVER MARRIED)
5 NEVER MARRIED WIDOWED 395.0 181.0 576.0 0.685764 0.5 -8.875000 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for NEVER MARRIED)
6 NEVER MARRIED SEPARATED 395.0 79.0 474.0 0.833333 0.5 -14.468429 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for NEVER MARRIED)
7 DIVORCED WIDOWED 314.0 181.0 495.0 0.634343 0.5 -5.932959 2.975236e-09 5.950471e-09 one-sample score with Yates continuity correction (with p0 for DIVORCED)
8 DIVORCED SEPARATED 314.0 79.0 393.0 0.798982 0.5 -11.803739 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for DIVORCED)
9 WIDOWED SEPARATED 181.0 79.0 260.0 0.696154 0.5 -6.263754 3.758180e-10 1.127454e-09 one-sample score with Yates continuity correction (with p0 for WIDOWED)
'''
if type(data) is list:
data = pd.Series(data)
freq = data.value_counts()
if expCount is None:
#assume all to be equal
n = sum(freq)
k = len(freq)
categories = list(freq.index)
expC = [n/k] * k
else:
#check if categories match
nE = 0
n = 0
for i in range(0, len(expCount)):
nE = nE + expCount.iloc[i,1]
n = n + freq[expCount.iloc[i,0]]
expC = []
for i in range(0,len(expCount)):
expC.append(expCount.iloc[i, 1]/nE*n)
k = len(expC)
categories = list(expCount.iloc[:,0])
n_pairs = int(k*(k-1)/2)
results = pd.DataFrame()
resRow=0
for i in range(0, k-1):
for j in range(i+1, k):
#category names
results.at[resRow, 0] = categories[i]
results.at[resRow, 1] = categories[j]
#category sizes
n1 = freq[categories[i]]
n2 = freq[categories[j]]
results.at[resRow, 2] = n1
results.at[resRow, 3] = n2
results.at[resRow, 4] = n1 + n2
#observed and expected proportion
obP1 = n1/(n1 + n2)
exP1 = expC[i]/(expC[i]+expC[j])
results.at[resRow, 5] = obP1
results.at[resRow, 6] = exP1
pair = [categories[i], categories[j]]
if test=="binomial":
# the test statistic
results.at[resRow, 7] = "n.a."
pair_test_result = ts_binomial_os(data, codes=pair, p0=exP1, **kwargs)
# the p-value
results.at[resRow, 8] = pair_test_result.iloc[0, 0]
# the adj. p-value
#fill something for the adjusted p-values
results.at[resRow, 9] = results.at[resRow, 8]
# description of test
results.at[resRow, 10] = pair_test_result.iloc[0, 1]
else:
if test=="wald":
pair_test_result = ts_wald_os(data, codes=pair, p0=exP1, **kwargs)
elif test=="score":
pair_test_result = ts_score_os(data, codes=pair, p0=exP1, **kwargs)
# the test statistic
results.at[resRow, 7] = pair_test_result.iloc[0, 1]
# the p-value
results.at[resRow, 8] = pair_test_result.iloc[0, 2]
#fill something for the adjusted p-values
results.at[resRow, 9] = results.at[resRow, 8]
# description of test
results.at[resRow, 10] = pair_test_result.iloc[0, 3]
resRow = resRow + 1
results.iloc[:,9] = p_adjust(results.iloc[:,8], method=mtc)
results.columns = ["category 1", "category 2", "n1", "n2", "n pair", "obs. prop. 1", "exp. prop. 1", "statistic", "p-value", "adj. p-value", "test"]
return results
Functions
def ph_pairwise_bin(data, test='binomial', expCount=None, mtc='bonferroni', **kwargs)-
Pairwise Binary Test for Post-Hoc Analysis
This function will perform a one-sample binary test for each possible pair in the data. This could either be a binomial, Wald or score test.
The unadjusted p-values and Bonferroni adjusted p-values are both determined.
This function is shown in this YouTube video and the test is also described at PeterStatistics.com
Parameters
data:listorpandas seriestest:{"binomial", "score", "wald"}, optional- test to use for each pair
expCount:pandas dataframe, optional- categories and expected counts
mtc:string, optional- any of the methods available in p_adjust() to correct for multiple tests
**kwargs:optional- additional arguments for the specific test that are passed along.
Returns
pandas.DataFrame-
A dataframe with the following columns:
- category 1 : the label of the first category
- category 2 : the label of the second category
- n1 : the sample size of the first category
- n2 : the sample size of the second category
- obs. prop. 1 : the proportion in the sample of the first category
- exp. prop. 1 : the expected proportion for the first category
- p-value : the unadjusted significance
- adj. p-value : the adjusted significance
Notes
None.
Before, After and Alternatives
Before this an omnibus test might be helpful: * ts_pearson_gof for Pearson Chi-Square Goodness-of-Fit Test * ts_freeman_tukey_gof for Freeman-Tukey Test of Goodness-of-Fit * ts_freeman_tukey_read for Freeman-Tukey-Read Test of Goodness-of-Fit * ts_g_gof for G (Likelihood Ratio) Goodness-of-Fit Test * ts_mod_log_likelihood_gof for Mod-Log Likelihood Test of Goodness-of-Fit * ts_multinomial_gof for Multinomial Goodness-of-Fit Test * ts_neyman_gof for Neyman Test of Goodness-of-Fit * ts_powerdivergence_gof for Power Divergence GoF Test
After this you might want to add an effect size measure: * es_post_hoc_gof for various effect sizes
Alternative post-hoc tests: * ph_pairwise_gof for Pairwise Goodness-of-Fit Tests * ph_residual_gof_bin for Residuals Tests * ph_residual_gof_gof for Residuals Using Goodness-of-Fit Tests
The binary test that is performed on each pair: * ts_binomial_os for One-Sample Binomial Test * ts_score_os for One-Sample Score Test * ts_wald_os for One-Sample Wald Test
More info on the adjustment for multiple testing: * p_adjust
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Examples: get data
>>> import pandas as pd >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) >>> gss_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = gss_df['mar1'];Example 1 using default settings:
>>> ph_pairwise_bin(ex1) category 1 category 2 n1 n2 n pair obs. prop. 1 exp. prop. 1 statistic p-value adj. p-value test 0 MARRIED NEVER MARRIED 972.0 395.0 1367.0 0.711046 0.5 n.a. 1.052263e-56 1.052263e-55 one-sample binomial, with equal-distance method (with p0 for MARRIED) 1 MARRIED DIVORCED 972.0 314.0 1286.0 0.755832 0.5 n.a. 7.829174e-79 7.829174e-78 one-sample binomial, with equal-distance method (with p0 for MARRIED) 2 MARRIED WIDOWED 972.0 181.0 1153.0 0.843018 0.5 n.a. 1.407217e-131 1.407217e-130 one-sample binomial, with equal-distance method (with p0 for MARRIED) 3 MARRIED SEPARATED 972.0 79.0 1051.0 0.924833 0.5 n.a. 1.267980e-196 1.267980e-195 one-sample binomial, with equal-distance method (with p0 for MARRIED) 4 NEVER MARRIED DIVORCED 395.0 314.0 709.0 0.557123 0.5 n.a. 3.001933e-03 3.001933e-02 one-sample binomial, with equal-distance method (with p0 for NEVER MARRIED) 5 NEVER MARRIED WIDOWED 395.0 181.0 576.0 0.685764 0.5 n.a. 1.352112e-19 1.352112e-18 one-sample binomial, with equal-distance method (with p0 for NEVER MARRIED) 6 NEVER MARRIED SEPARATED 395.0 79.0 474.0 0.833333 0.5 n.a. 7.075688e-52 7.075688e-51 one-sample binomial, with equal-distance method (with p0 for NEVER MARRIED) 7 DIVORCED WIDOWED 314.0 181.0 495.0 0.634343 0.5 n.a. 3.295753e-09 3.295753e-08 one-sample binomial, with equal-distance method (with p0 for DIVORCED) 8 DIVORCED SEPARATED 314.0 79.0 393.0 0.798982 0.5 n.a. 1.472395e-34 1.472395e-33 one-sample binomial, with equal-distance method (with p0 for DIVORCED) 9 WIDOWED SEPARATED 181.0 79.0 260.0 0.696154 0.5 n.a. 2.223544e-10 2.223544e-09 one-sample binomial, with equal-distance method (with p0 for WIDOWED)Example 2 using a score test with Yates correction:
>>> ph_pairwise_bin(ex1, test="score", mtc='holm', cc='yates') category 1 category 2 n1 n2 n pair obs. prop. 1 exp. prop. 1 statistic p-value adj. p-value test 0 MARRIED NEVER MARRIED 972.0 395.0 1367.0 0.711046 0.5 -15.578952 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED) 1 MARRIED DIVORCED 972.0 314.0 1286.0 0.755832 0.5 -18.320819 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED) 2 MARRIED WIDOWED 972.0 181.0 1153.0 0.843018 0.5 -23.265503 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED) 3 MARRIED SEPARATED 972.0 79.0 1051.0 0.924833 0.5 -27.514619 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED) 4 NEVER MARRIED DIVORCED 395.0 314.0 709.0 0.557123 0.5 -3.004463 2.660501e-03 2.660501e-03 one-sample score with Yates continuity correction (with p0 for NEVER MARRIED) 5 NEVER MARRIED WIDOWED 395.0 181.0 576.0 0.685764 0.5 -8.875000 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for NEVER MARRIED) 6 NEVER MARRIED SEPARATED 395.0 79.0 474.0 0.833333 0.5 -14.468429 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for NEVER MARRIED) 7 DIVORCED WIDOWED 314.0 181.0 495.0 0.634343 0.5 -5.932959 2.975236e-09 5.950471e-09 one-sample score with Yates continuity correction (with p0 for DIVORCED) 8 DIVORCED SEPARATED 314.0 79.0 393.0 0.798982 0.5 -11.803739 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for DIVORCED) 9 WIDOWED SEPARATED 181.0 79.0 260.0 0.696154 0.5 -6.263754 3.758180e-10 1.127454e-09 one-sample score with Yates continuity correction (with p0 for WIDOWED)Expand source code
def ph_pairwise_bin(data, test="binomial", expCount=None, mtc='bonferroni', **kwargs): ''' Pairwise Binary Test for Post-Hoc Analysis -------------------------------------------- This function will perform a one-sample binary test for each possible pair in the data. This could either be a binomial, Wald or score test. The unadjusted p-values and Bonferroni adjusted p-values are both determined. This function is shown in this [YouTube video](https://youtu.be/0uY4VAbvGpQ) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/PostHocAfterGoF.html) Parameters ---------- data : list or pandas series test : {"binomial", "score", "wald"}, optional test to use for each pair expCount : pandas dataframe, optional categories and expected counts mtc : string, optional any of the methods available in p_adjust() to correct for multiple tests **kwargs : optional additional arguments for the specific test that are passed along. Returns ------- pandas.DataFrame A dataframe with the following columns: - *category 1* : the label of the first category - *category 2* : the label of the second category - *n1* : the sample size of the first category - *n2* : the sample size of the second category - *obs. prop. 1* : the proportion in the sample of the first category - *exp. prop. 1* : the expected proportion for the first category - *p-value* : the unadjusted significance - *adj. p-value* : the adjusted significance Notes ----- None. Before, After and Alternatives ------------------------------ Before this an omnibus test might be helpful: * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test * [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test After this you might want to add an effect size measure: * [es_post_hoc_gof](../effect_sizes/eff_size_post_hoc_gof.html#es_post_hoc_gof) for various effect sizes Alternative post-hoc tests: * [ph_pairwise_gof](../other/poho_pairwise_gof.html#ph_pairwise_gof) for Pairwise Goodness-of-Fit Tests * [ph_residual_gof_bin](../other/poho_residual_gof_bin.html#ph_residual_gof_bin) for Residuals Tests * [ph_residual_gof_gof](../other/poho_residual_gof_gof.html#ph_residual_gof_gof) for Residuals Using Goodness-of-Fit Tests The binary test that is performed on each pair: * [ts_binomial_os](../tests/test_binomial_os.html#ts_binomial_os) for One-Sample Binomial Test * [ts_score_os](../tests/test_score_os.html#ts_score_os) for One-Sample Score Test * [ts_wald_os](../tests/test_wald_os.html#ts_wald_os) for One-Sample Wald Test More info on the adjustment for multiple testing: * [p_adjust](../other/p_adjustments.html#p_adjust) Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Examples: get data >>> import pandas as pd >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) >>> gss_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = gss_df['mar1']; Example 1 using default settings: >>> ph_pairwise_bin(ex1) category 1 category 2 n1 n2 n pair obs. prop. 1 exp. prop. 1 statistic p-value adj. p-value test 0 MARRIED NEVER MARRIED 972.0 395.0 1367.0 0.711046 0.5 n.a. 1.052263e-56 1.052263e-55 one-sample binomial, with equal-distance method (with p0 for MARRIED) 1 MARRIED DIVORCED 972.0 314.0 1286.0 0.755832 0.5 n.a. 7.829174e-79 7.829174e-78 one-sample binomial, with equal-distance method (with p0 for MARRIED) 2 MARRIED WIDOWED 972.0 181.0 1153.0 0.843018 0.5 n.a. 1.407217e-131 1.407217e-130 one-sample binomial, with equal-distance method (with p0 for MARRIED) 3 MARRIED SEPARATED 972.0 79.0 1051.0 0.924833 0.5 n.a. 1.267980e-196 1.267980e-195 one-sample binomial, with equal-distance method (with p0 for MARRIED) 4 NEVER MARRIED DIVORCED 395.0 314.0 709.0 0.557123 0.5 n.a. 3.001933e-03 3.001933e-02 one-sample binomial, with equal-distance method (with p0 for NEVER MARRIED) 5 NEVER MARRIED WIDOWED 395.0 181.0 576.0 0.685764 0.5 n.a. 1.352112e-19 1.352112e-18 one-sample binomial, with equal-distance method (with p0 for NEVER MARRIED) 6 NEVER MARRIED SEPARATED 395.0 79.0 474.0 0.833333 0.5 n.a. 7.075688e-52 7.075688e-51 one-sample binomial, with equal-distance method (with p0 for NEVER MARRIED) 7 DIVORCED WIDOWED 314.0 181.0 495.0 0.634343 0.5 n.a. 3.295753e-09 3.295753e-08 one-sample binomial, with equal-distance method (with p0 for DIVORCED) 8 DIVORCED SEPARATED 314.0 79.0 393.0 0.798982 0.5 n.a. 1.472395e-34 1.472395e-33 one-sample binomial, with equal-distance method (with p0 for DIVORCED) 9 WIDOWED SEPARATED 181.0 79.0 260.0 0.696154 0.5 n.a. 2.223544e-10 2.223544e-09 one-sample binomial, with equal-distance method (with p0 for WIDOWED) Example 2 using a score test with Yates correction: >>> ph_pairwise_bin(ex1, test="score", mtc='holm', cc='yates') category 1 category 2 n1 n2 n pair obs. prop. 1 exp. prop. 1 statistic p-value adj. p-value test 0 MARRIED NEVER MARRIED 972.0 395.0 1367.0 0.711046 0.5 -15.578952 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED) 1 MARRIED DIVORCED 972.0 314.0 1286.0 0.755832 0.5 -18.320819 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED) 2 MARRIED WIDOWED 972.0 181.0 1153.0 0.843018 0.5 -23.265503 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED) 3 MARRIED SEPARATED 972.0 79.0 1051.0 0.924833 0.5 -27.514619 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for MARRIED) 4 NEVER MARRIED DIVORCED 395.0 314.0 709.0 0.557123 0.5 -3.004463 2.660501e-03 2.660501e-03 one-sample score with Yates continuity correction (with p0 for NEVER MARRIED) 5 NEVER MARRIED WIDOWED 395.0 181.0 576.0 0.685764 0.5 -8.875000 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for NEVER MARRIED) 6 NEVER MARRIED SEPARATED 395.0 79.0 474.0 0.833333 0.5 -14.468429 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for NEVER MARRIED) 7 DIVORCED WIDOWED 314.0 181.0 495.0 0.634343 0.5 -5.932959 2.975236e-09 5.950471e-09 one-sample score with Yates continuity correction (with p0 for DIVORCED) 8 DIVORCED SEPARATED 314.0 79.0 393.0 0.798982 0.5 -11.803739 0.000000e+00 0.000000e+00 one-sample score with Yates continuity correction (with p0 for DIVORCED) 9 WIDOWED SEPARATED 181.0 79.0 260.0 0.696154 0.5 -6.263754 3.758180e-10 1.127454e-09 one-sample score with Yates continuity correction (with p0 for WIDOWED) ''' if type(data) is list: data = pd.Series(data) freq = data.value_counts() if expCount is None: #assume all to be equal n = sum(freq) k = len(freq) categories = list(freq.index) expC = [n/k] * k else: #check if categories match nE = 0 n = 0 for i in range(0, len(expCount)): nE = nE + expCount.iloc[i,1] n = n + freq[expCount.iloc[i,0]] expC = [] for i in range(0,len(expCount)): expC.append(expCount.iloc[i, 1]/nE*n) k = len(expC) categories = list(expCount.iloc[:,0]) n_pairs = int(k*(k-1)/2) results = pd.DataFrame() resRow=0 for i in range(0, k-1): for j in range(i+1, k): #category names results.at[resRow, 0] = categories[i] results.at[resRow, 1] = categories[j] #category sizes n1 = freq[categories[i]] n2 = freq[categories[j]] results.at[resRow, 2] = n1 results.at[resRow, 3] = n2 results.at[resRow, 4] = n1 + n2 #observed and expected proportion obP1 = n1/(n1 + n2) exP1 = expC[i]/(expC[i]+expC[j]) results.at[resRow, 5] = obP1 results.at[resRow, 6] = exP1 pair = [categories[i], categories[j]] if test=="binomial": # the test statistic results.at[resRow, 7] = "n.a." pair_test_result = ts_binomial_os(data, codes=pair, p0=exP1, **kwargs) # the p-value results.at[resRow, 8] = pair_test_result.iloc[0, 0] # the adj. p-value #fill something for the adjusted p-values results.at[resRow, 9] = results.at[resRow, 8] # description of test results.at[resRow, 10] = pair_test_result.iloc[0, 1] else: if test=="wald": pair_test_result = ts_wald_os(data, codes=pair, p0=exP1, **kwargs) elif test=="score": pair_test_result = ts_score_os(data, codes=pair, p0=exP1, **kwargs) # the test statistic results.at[resRow, 7] = pair_test_result.iloc[0, 1] # the p-value results.at[resRow, 8] = pair_test_result.iloc[0, 2] #fill something for the adjusted p-values results.at[resRow, 9] = results.at[resRow, 8] # description of test results.at[resRow, 10] = pair_test_result.iloc[0, 3] resRow = resRow + 1 results.iloc[:,9] = p_adjust(results.iloc[:,8], method=mtc) results.columns = ["category 1", "category 2", "n1", "n2", "n pair", "obs. prop. 1", "exp. prop. 1", "statistic", "p-value", "adj. p-value", "test"] return results