Module stikpetP.other.poho_pairwise_gof
Expand source code
import pandas as pd
from ..tests.test_multinomial_gof import ts_multinomial_gof
from ..tests.test_powerdivergence_gof import ts_powerdivergence_gof
from ..tests.test_neyman_gof import ts_neyman_gof
from ..tests.test_mod_log_likelihood_gof import ts_mod_log_likelihood_gof
from ..tests.test_g_gof import ts_g_gof
from ..tests.test_freeman_tukey_read import ts_freeman_tukey_read
from ..tests.test_freeman_tukey_gof import ts_freeman_tukey_gof
from ..tests.test_pearson_gof import ts_pearson_gof
from ..other.p_adjustments import p_adjust
def ph_pairwise_gof(data, test="pearson", expCount=None, mtc='bonferroni', **kwargs):
'''
Pairwise Goodness-of-Fit Tests for Post-Hoc Analysis
--------------------------------------------
This function will perform a goodness-of-fit test for each possible pair in the data. This could be any of the goodness-of-fit tests, e.g. a Pearson chi-square.
The unadjusted p-values and Bonferroni adjusted p-values are both determined.
This function is shown in this [YouTube video](https://youtu.be/DNx7-eVp16g) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/PostHocAfterGoF.html)
Parameters
----------
data : list or pandas series
test : {"pearson", "freeman-tukey", "freeman-tukey-read", "g", "mod-log-g", "neyman", "powerdivergence", "multinomial"}, optional
test to use for each pair
expCount : pandas dataframe, optional
categories and expected counts
mtc : string, optional
any of the methods available in p_adjust() to correct for multiple tests
**kwargs : optional
additional arguments for the specific test that are passed along.
Returns
-------
pandas.DataFrame
A dataframe with the following columns:
- *category 1* : the label of the first category
- *category 2* : the label of the second category
- *n1* : the sample size of the first category
- *n2* : the sample size of the second category
- *obs. prop. 1* : the proportion in the sample of the first category
- *exp. prop. 1* : the expected proportion for the first category
- *statistic* : the chi-square test statistic
- *df* : the degrees of freedom or in case
- *p-value* : the unadjusted significance
- *adj. p-value* : the adjusted significance
- *minExp* : the minimum expected count
- *percBelow5* : the percentage of cells with an expected count below 5
- *test* : description of the test used
- In case of a multinomial test, the same columns except:
- *p obs* instead of *statistic* : showing the probability of the observed sample table
- *n combs.*, instead of *df* : showing the number of possible tables
- no *minExp* and *propBelow5* column.
Notes
-----
None
Before, After and Alternatives
------------------------------
Before this an omnibus test might be helpful, these are also the tests used on each pair:
* [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
* [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit
* [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
* [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
* [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
* [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
* [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
* [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
After this you might want to add an effect size measure:
* [es_post_hoc_gof](../effect_sizes/eff_size_post_hoc_gof.html#es_post_hoc_gof) for various effect sizes
Alternative post-hoc tests:
* [ph_pairwise_bin](../other/poho_pairwise_bin.html#ph_pairwise_bin) for Pairwise Binary Test
* [ph_residual_gof_bin](../other/poho_residual_gof_bin.html#ph_residual_gof_bin) for Residuals Tests
* [ph_residual_gof_gof](../other/poho_residual_gof_gof.html#ph_residual_gof_gof) for Residuals Using Goodness-of-Fit Tests
More info on the adjustment for multiple testing:
* [p_adjust](../other/p_adjustments.html#p_adjust)
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Examples: get data
>>> import pandas as pd
>>> pd.set_option('display.width',1000)
>>> pd.set_option('display.max_columns', 1000)
>>> gss_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = gss_df['mar1'];
Example 1 using default settings:
>>> ph_pairwise_gof(ex1)
category 1 category 2 n1 n2 obs. prop. 1 exp. prop. 1 statistic df p-value adj. p-value minExp percBelow5 test
0 MARRIED NEVER MARRIED 972.0 395.0 0.711046 0.5 243.547184 1.0 6.626501e-55 6.626501e-54 683.5 0.0 Pearson chi-square test of goodness-of-fit
1 MARRIED DIVORCED 972.0 314.0 0.755832 0.5 336.674961 1.0 3.380767e-75 3.380767e-74 643.0 0.0 Pearson chi-square test of goodness-of-fit
2 MARRIED WIDOWED 972.0 181.0 0.843018 0.5 542.654814 1.0 4.987536e-120 4.987536e-119 576.5 0.0 Pearson chi-square test of goodness-of-fit
3 MARRIED SEPARATED 972.0 79.0 0.924833 0.5 758.752617 1.0 5.015130e-167 5.015130e-166 525.5 0.0 Pearson chi-square test of goodness-of-fit
4 NEVER MARRIED DIVORCED 395.0 314.0 0.557123 0.5 9.253879 1.0 2.349972e-03 2.349972e-02 354.5 0.0 Pearson chi-square test of goodness-of-fit
5 NEVER MARRIED WIDOWED 395.0 181.0 0.685764 0.5 79.506944 1.0 4.805333e-19 4.805333e-18 288.0 0.0 Pearson chi-square test of goodness-of-fit
6 NEVER MARRIED SEPARATED 395.0 79.0 0.833333 0.5 210.666667 1.0 9.826958e-48 9.826958e-47 237.0 0.0 Pearson chi-square test of goodness-of-fit
7 DIVORCED WIDOWED 314.0 181.0 0.634343 0.5 35.735354 1.0 2.260249e-09 2.260249e-08 247.5 0.0 Pearson chi-square test of goodness-of-fit
8 DIVORCED SEPARATED 314.0 79.0 0.798982 0.5 140.521628 1.0 2.047138e-32 2.047138e-31 196.5 0.0 Pearson chi-square test of goodness-of-fit
9 WIDOWED SEPARATED 181.0 79.0 0.696154 0.5 40.015385 1.0 2.519705e-10 2.519705e-09 130.0 0.0 Pearson chi-square test of goodness-of-fit
Example 2 using a G test with Pearson correction:
>>> ph_pairwise_gof(ex1, test="g", mtc='holm', cc='pearson')
category 1 category 2 n1 n2 obs. prop. 1 exp. prop. 1 statistic df p-value adj. p-value minExp percBelow5 test
0 MARRIED NEVER MARRIED 972.0 395.0 0.711046 0.5 251.165891 1.0 1.446330e-56 1.012431e-55 683.5 0.0 G test of goodness-of-fit, and Pearson correction
1 MARRIED DIVORCED 972.0 314.0 0.755832 0.5 352.887628 1.0 9.961507e-79 7.969206e-78 643.0 0.0 G test of goodness-of-fit, and Pearson correction
2 MARRIED WIDOWED 972.0 181.0 0.843018 0.5 595.621443 1.0 1.500292e-131 1.350263e-130 576.5 0.0 G test of goodness-of-fit, and Pearson correction
3 MARRIED SEPARATED 972.0 79.0 0.924833 0.5 895.323655 1.0 1.019541e-196 1.019541e-195 525.5 0.0 G test of goodness-of-fit, and Pearson correction
4 NEVER MARRIED DIVORCED 395.0 314.0 0.557123 0.5 9.261034 1.0 2.340808e-03 2.340808e-03 354.5 0.0 G test of goodness-of-fit, and Pearson correction
5 NEVER MARRIED WIDOWED 395.0 181.0 0.685764 0.5 81.303809 1.0 1.935532e-19 7.742128e-19 288.0 0.0 G test of goodness-of-fit, and Pearson correction
6 NEVER MARRIED SEPARATED 395.0 79.0 0.833333 0.5 229.486329 1.0 7.715537e-52 4.629322e-51 237.0 0.0 G test of goodness-of-fit, and Pearson correction
7 DIVORCED WIDOWED 314.0 181.0 0.634343 0.5 36.105157 1.0 1.869514e-09 3.739029e-09 247.5 0.0 G test of goodness-of-fit, and Pearson correction
8 DIVORCED SEPARATED 314.0 79.0 0.798982 0.5 150.008211 1.0 1.726494e-34 8.632469e-34 196.5 0.0 G test of goodness-of-fit, and Pearson correction
9 WIDOWED SEPARATED 181.0 79.0 0.696154 0.5 40.952653 1.0 1.559621e-10 4.678864e-10 130.0 0.0 G test of goodness-of-fit, and Pearson correction
'''
if type(data) is list:
data = pd.Series(data)
freq = data.value_counts()
if expCount is None:
#assume all to be equal
n = sum(freq)
k = len(freq)
categories = list(freq.index)
expC = [n/k] * k
else:
#check if categories match
nE = 0
n = 0
for i in range(0, len(expCount)):
nE = nE + expCount.iloc[i,1]
n = n + freq[expCount.iloc[i,0]]
expC = []
for i in range(0,len(expCount)):
expC.append(expCount.iloc[i, 1]/nE*n)
k = len(expC)
categories = list(expCount.iloc[:,0])
n_pairs = int(k*(k-1)/2)
results = pd.DataFrame()
resRow=0
for i in range(0, k-1):
for j in range(i+1, k):
#category names
results.at[resRow, 0] = categories[i]
results.at[resRow, 1] = categories[j]
#category sizes
n1 = freq[categories[i]]
n2 = freq[categories[j]]
results.at[resRow, 2] = n1
results.at[resRow, 3] = n2
#data and expected counts
expected_proportion_1 = expC[i]/n
expected_proportion_2 = expC[j]/n
exp_count_1 = (n1 + n2)*(expected_proportion_1*1/(expected_proportion_1+expected_proportion_2))
exp_count_2 = (n1 + n2)*(expected_proportion_2*1/(expected_proportion_1+expected_proportion_2))
exP = pd.DataFrame([[categories[i], exp_count_1], [categories[j], exp_count_2]], columns=['category', 'count'])
results.at[resRow, 4] = n1/(n1 + n2)
results.at[resRow, 5] = exp_count_1/(n1 + n2)
pair = [categories[i], categories[j]]
data_pair = data[data.isin(pair)]
if test=="pearson":
pair_test_result = ts_pearson_gof(data_pair, expCounts=exP, **kwargs)
elif test=="freeman-tukey":
pair_test_result = ts_freeman_tukey_gof(data_pair, expCounts=exP, **kwargs)
elif test=="freeman-tukey-read":
pair_test_result = ts_freeman_tukey_read(data_pair, expCounts=exP, **kwargs)
elif test=="g":
pair_test_result = ts_g_gof(data_pair, expCounts=exP, **kwargs)
elif test=="mod-log-g":
pair_test_result = ts_mod_log_likelihood_gof(data_pair, expCounts=exP, **kwargs)
elif test=="neyman":
pair_test_result = ts_neyman_gof(data_pair, expCounts=exP, **kwargs)
elif test=="powerdivergence":
pair_test_result = ts_powerdivergence_gof(data_pair, expCounts=exP, **kwargs)
if test=="multinomial":
pair_test_result = ts_multinomial_gof(data_pair, expCounts=exP, **kwargs)
results.at[resRow, 6] = pair_test_result.iloc[0, 0]
results.at[resRow, 7] = pair_test_result.iloc[0, 1]
results.at[resRow, 8] = pair_test_result.iloc[0, 2]
results.at[resRow, 9] = results.at[resRow, 8]
results.at[resRow, 10] = pair_test_result.iloc[0, 3]
else:
results.at[resRow, 6] = pair_test_result.iloc[0, 2]
results.at[resRow, 7] = pair_test_result.iloc[0, 3]
results.at[resRow, 8] = pair_test_result.iloc[0, 4]
results.at[resRow, 9] = results.at[resRow, 8]
results.at[resRow, 10] = pair_test_result.iloc[0, 5]
results.at[resRow, 11] = pair_test_result.iloc[0, 6]
results.at[resRow, 12] = pair_test_result.iloc[0, 7]
resRow = resRow + 1
results.iloc[:,9] = p_adjust(results.iloc[:,8], method=mtc)
if test == "multinomial":
# Set columns for multinomial case
results.columns = [
"category 1", "category 2", "n1", "n2", "obs. prop. 1", "exp. prop. 1", "p obs", "n combs.",
"p-value", "adj. p-value", "test"
]
else:
# Set columns for other cases
results.columns = [
"category 1", "category 2", "n1", "n2", "obs. prop. 1", "exp. prop. 1", "statistic", "df",
"p-value", "adj. p-value", "minExp", "percBelow5", "test"
]
return results
Functions
def ph_pairwise_gof(data, test='pearson', expCount=None, mtc='bonferroni', **kwargs)-
Pairwise Goodness-of-Fit Tests for Post-Hoc Analysis
This function will perform a goodness-of-fit test for each possible pair in the data. This could be any of the goodness-of-fit tests, e.g. a Pearson chi-square.
The unadjusted p-values and Bonferroni adjusted p-values are both determined.
This function is shown in this YouTube video and the test is also described at PeterStatistics.com
Parameters
data:listorpandas seriestest:{"pearson", "freeman-tukey", "freeman-tukey-read", "g", "mod-log-g", "neyman", "powerdivergence", "multinomial"}, optional- test to use for each pair
expCount:pandas dataframe, optional- categories and expected counts
mtc:string, optional- any of the methods available in p_adjust() to correct for multiple tests
**kwargs:optional- additional arguments for the specific test that are passed along.
Returns
pandas.DataFrame-
A dataframe with the following columns:
- category 1 : the label of the first category
- category 2 : the label of the second category
- n1 : the sample size of the first category
- n2 : the sample size of the second category
- obs. prop. 1 : the proportion in the sample of the first category
-
exp. prop. 1 : the expected proportion for the first category
-
statistic : the chi-square test statistic
- df : the degrees of freedom or in case
- p-value : the unadjusted significance
- adj. p-value : the adjusted significance
- minExp : the minimum expected count
- percBelow5 : the percentage of cells with an expected count below 5
-
test : description of the test used
-
In case of a multinomial test, the same columns except:
-
p obs instead of statistic : showing the probability of the observed sample table
- n combs., instead of df : showing the number of possible tables
- no minExp and propBelow5 column.
Notes
None
Before, After and Alternatives
Before this an omnibus test might be helpful, these are also the tests used on each pair: * ts_pearson_gof for Pearson Chi-Square Goodness-of-Fit Test * ts_freeman_tukey_gof for Freeman-Tukey Test of Goodness-of-Fit * ts_freeman_tukey_read for Freeman-Tukey-Read Test of Goodness-of-Fit * ts_g_gof for G (Likelihood Ratio) Goodness-of-Fit Test * ts_mod_log_likelihood_gof for Mod-Log Likelihood Test of Goodness-of-Fit * ts_multinomial_gof for Multinomial Goodness-of-Fit Test * ts_neyman_gof for Neyman Test of Goodness-of-Fit * ts_powerdivergence_gof for Power Divergence GoF Test
After this you might want to add an effect size measure: * es_post_hoc_gof for various effect sizes
Alternative post-hoc tests: * ph_pairwise_bin for Pairwise Binary Test * ph_residual_gof_bin for Residuals Tests * ph_residual_gof_gof for Residuals Using Goodness-of-Fit Tests
More info on the adjustment for multiple testing: * p_adjust
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Examples: get data
>>> import pandas as pd >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) >>> gss_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = gss_df['mar1'];Example 1 using default settings:
>>> ph_pairwise_gof(ex1) category 1 category 2 n1 n2 obs. prop. 1 exp. prop. 1 statistic df p-value adj. p-value minExp percBelow5 test 0 MARRIED NEVER MARRIED 972.0 395.0 0.711046 0.5 243.547184 1.0 6.626501e-55 6.626501e-54 683.5 0.0 Pearson chi-square test of goodness-of-fit 1 MARRIED DIVORCED 972.0 314.0 0.755832 0.5 336.674961 1.0 3.380767e-75 3.380767e-74 643.0 0.0 Pearson chi-square test of goodness-of-fit 2 MARRIED WIDOWED 972.0 181.0 0.843018 0.5 542.654814 1.0 4.987536e-120 4.987536e-119 576.5 0.0 Pearson chi-square test of goodness-of-fit 3 MARRIED SEPARATED 972.0 79.0 0.924833 0.5 758.752617 1.0 5.015130e-167 5.015130e-166 525.5 0.0 Pearson chi-square test of goodness-of-fit 4 NEVER MARRIED DIVORCED 395.0 314.0 0.557123 0.5 9.253879 1.0 2.349972e-03 2.349972e-02 354.5 0.0 Pearson chi-square test of goodness-of-fit 5 NEVER MARRIED WIDOWED 395.0 181.0 0.685764 0.5 79.506944 1.0 4.805333e-19 4.805333e-18 288.0 0.0 Pearson chi-square test of goodness-of-fit 6 NEVER MARRIED SEPARATED 395.0 79.0 0.833333 0.5 210.666667 1.0 9.826958e-48 9.826958e-47 237.0 0.0 Pearson chi-square test of goodness-of-fit 7 DIVORCED WIDOWED 314.0 181.0 0.634343 0.5 35.735354 1.0 2.260249e-09 2.260249e-08 247.5 0.0 Pearson chi-square test of goodness-of-fit 8 DIVORCED SEPARATED 314.0 79.0 0.798982 0.5 140.521628 1.0 2.047138e-32 2.047138e-31 196.5 0.0 Pearson chi-square test of goodness-of-fit 9 WIDOWED SEPARATED 181.0 79.0 0.696154 0.5 40.015385 1.0 2.519705e-10 2.519705e-09 130.0 0.0 Pearson chi-square test of goodness-of-fitExample 2 using a G test with Pearson correction:
>>> ph_pairwise_gof(ex1, test="g", mtc='holm', cc='pearson') category 1 category 2 n1 n2 obs. prop. 1 exp. prop. 1 statistic df p-value adj. p-value minExp percBelow5 test 0 MARRIED NEVER MARRIED 972.0 395.0 0.711046 0.5 251.165891 1.0 1.446330e-56 1.012431e-55 683.5 0.0 G test of goodness-of-fit, and Pearson correction 1 MARRIED DIVORCED 972.0 314.0 0.755832 0.5 352.887628 1.0 9.961507e-79 7.969206e-78 643.0 0.0 G test of goodness-of-fit, and Pearson correction 2 MARRIED WIDOWED 972.0 181.0 0.843018 0.5 595.621443 1.0 1.500292e-131 1.350263e-130 576.5 0.0 G test of goodness-of-fit, and Pearson correction 3 MARRIED SEPARATED 972.0 79.0 0.924833 0.5 895.323655 1.0 1.019541e-196 1.019541e-195 525.5 0.0 G test of goodness-of-fit, and Pearson correction 4 NEVER MARRIED DIVORCED 395.0 314.0 0.557123 0.5 9.261034 1.0 2.340808e-03 2.340808e-03 354.5 0.0 G test of goodness-of-fit, and Pearson correction 5 NEVER MARRIED WIDOWED 395.0 181.0 0.685764 0.5 81.303809 1.0 1.935532e-19 7.742128e-19 288.0 0.0 G test of goodness-of-fit, and Pearson correction 6 NEVER MARRIED SEPARATED 395.0 79.0 0.833333 0.5 229.486329 1.0 7.715537e-52 4.629322e-51 237.0 0.0 G test of goodness-of-fit, and Pearson correction 7 DIVORCED WIDOWED 314.0 181.0 0.634343 0.5 36.105157 1.0 1.869514e-09 3.739029e-09 247.5 0.0 G test of goodness-of-fit, and Pearson correction 8 DIVORCED SEPARATED 314.0 79.0 0.798982 0.5 150.008211 1.0 1.726494e-34 8.632469e-34 196.5 0.0 G test of goodness-of-fit, and Pearson correction 9 WIDOWED SEPARATED 181.0 79.0 0.696154 0.5 40.952653 1.0 1.559621e-10 4.678864e-10 130.0 0.0 G test of goodness-of-fit, and Pearson correctionExpand source code
def ph_pairwise_gof(data, test="pearson", expCount=None, mtc='bonferroni', **kwargs): ''' Pairwise Goodness-of-Fit Tests for Post-Hoc Analysis -------------------------------------------- This function will perform a goodness-of-fit test for each possible pair in the data. This could be any of the goodness-of-fit tests, e.g. a Pearson chi-square. The unadjusted p-values and Bonferroni adjusted p-values are both determined. This function is shown in this [YouTube video](https://youtu.be/DNx7-eVp16g) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/PostHocAfterGoF.html) Parameters ---------- data : list or pandas series test : {"pearson", "freeman-tukey", "freeman-tukey-read", "g", "mod-log-g", "neyman", "powerdivergence", "multinomial"}, optional test to use for each pair expCount : pandas dataframe, optional categories and expected counts mtc : string, optional any of the methods available in p_adjust() to correct for multiple tests **kwargs : optional additional arguments for the specific test that are passed along. Returns ------- pandas.DataFrame A dataframe with the following columns: - *category 1* : the label of the first category - *category 2* : the label of the second category - *n1* : the sample size of the first category - *n2* : the sample size of the second category - *obs. prop. 1* : the proportion in the sample of the first category - *exp. prop. 1* : the expected proportion for the first category - *statistic* : the chi-square test statistic - *df* : the degrees of freedom or in case - *p-value* : the unadjusted significance - *adj. p-value* : the adjusted significance - *minExp* : the minimum expected count - *percBelow5* : the percentage of cells with an expected count below 5 - *test* : description of the test used - In case of a multinomial test, the same columns except: - *p obs* instead of *statistic* : showing the probability of the observed sample table - *n combs.*, instead of *df* : showing the number of possible tables - no *minExp* and *propBelow5* column. Notes ----- None Before, After and Alternatives ------------------------------ Before this an omnibus test might be helpful, these are also the tests used on each pair: * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test * [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test After this you might want to add an effect size measure: * [es_post_hoc_gof](../effect_sizes/eff_size_post_hoc_gof.html#es_post_hoc_gof) for various effect sizes Alternative post-hoc tests: * [ph_pairwise_bin](../other/poho_pairwise_bin.html#ph_pairwise_bin) for Pairwise Binary Test * [ph_residual_gof_bin](../other/poho_residual_gof_bin.html#ph_residual_gof_bin) for Residuals Tests * [ph_residual_gof_gof](../other/poho_residual_gof_gof.html#ph_residual_gof_gof) for Residuals Using Goodness-of-Fit Tests More info on the adjustment for multiple testing: * [p_adjust](../other/p_adjustments.html#p_adjust) Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Examples: get data >>> import pandas as pd >>> pd.set_option('display.width',1000) >>> pd.set_option('display.max_columns', 1000) >>> gss_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = gss_df['mar1']; Example 1 using default settings: >>> ph_pairwise_gof(ex1) category 1 category 2 n1 n2 obs. prop. 1 exp. prop. 1 statistic df p-value adj. p-value minExp percBelow5 test 0 MARRIED NEVER MARRIED 972.0 395.0 0.711046 0.5 243.547184 1.0 6.626501e-55 6.626501e-54 683.5 0.0 Pearson chi-square test of goodness-of-fit 1 MARRIED DIVORCED 972.0 314.0 0.755832 0.5 336.674961 1.0 3.380767e-75 3.380767e-74 643.0 0.0 Pearson chi-square test of goodness-of-fit 2 MARRIED WIDOWED 972.0 181.0 0.843018 0.5 542.654814 1.0 4.987536e-120 4.987536e-119 576.5 0.0 Pearson chi-square test of goodness-of-fit 3 MARRIED SEPARATED 972.0 79.0 0.924833 0.5 758.752617 1.0 5.015130e-167 5.015130e-166 525.5 0.0 Pearson chi-square test of goodness-of-fit 4 NEVER MARRIED DIVORCED 395.0 314.0 0.557123 0.5 9.253879 1.0 2.349972e-03 2.349972e-02 354.5 0.0 Pearson chi-square test of goodness-of-fit 5 NEVER MARRIED WIDOWED 395.0 181.0 0.685764 0.5 79.506944 1.0 4.805333e-19 4.805333e-18 288.0 0.0 Pearson chi-square test of goodness-of-fit 6 NEVER MARRIED SEPARATED 395.0 79.0 0.833333 0.5 210.666667 1.0 9.826958e-48 9.826958e-47 237.0 0.0 Pearson chi-square test of goodness-of-fit 7 DIVORCED WIDOWED 314.0 181.0 0.634343 0.5 35.735354 1.0 2.260249e-09 2.260249e-08 247.5 0.0 Pearson chi-square test of goodness-of-fit 8 DIVORCED SEPARATED 314.0 79.0 0.798982 0.5 140.521628 1.0 2.047138e-32 2.047138e-31 196.5 0.0 Pearson chi-square test of goodness-of-fit 9 WIDOWED SEPARATED 181.0 79.0 0.696154 0.5 40.015385 1.0 2.519705e-10 2.519705e-09 130.0 0.0 Pearson chi-square test of goodness-of-fit Example 2 using a G test with Pearson correction: >>> ph_pairwise_gof(ex1, test="g", mtc='holm', cc='pearson') category 1 category 2 n1 n2 obs. prop. 1 exp. prop. 1 statistic df p-value adj. p-value minExp percBelow5 test 0 MARRIED NEVER MARRIED 972.0 395.0 0.711046 0.5 251.165891 1.0 1.446330e-56 1.012431e-55 683.5 0.0 G test of goodness-of-fit, and Pearson correction 1 MARRIED DIVORCED 972.0 314.0 0.755832 0.5 352.887628 1.0 9.961507e-79 7.969206e-78 643.0 0.0 G test of goodness-of-fit, and Pearson correction 2 MARRIED WIDOWED 972.0 181.0 0.843018 0.5 595.621443 1.0 1.500292e-131 1.350263e-130 576.5 0.0 G test of goodness-of-fit, and Pearson correction 3 MARRIED SEPARATED 972.0 79.0 0.924833 0.5 895.323655 1.0 1.019541e-196 1.019541e-195 525.5 0.0 G test of goodness-of-fit, and Pearson correction 4 NEVER MARRIED DIVORCED 395.0 314.0 0.557123 0.5 9.261034 1.0 2.340808e-03 2.340808e-03 354.5 0.0 G test of goodness-of-fit, and Pearson correction 5 NEVER MARRIED WIDOWED 395.0 181.0 0.685764 0.5 81.303809 1.0 1.935532e-19 7.742128e-19 288.0 0.0 G test of goodness-of-fit, and Pearson correction 6 NEVER MARRIED SEPARATED 395.0 79.0 0.833333 0.5 229.486329 1.0 7.715537e-52 4.629322e-51 237.0 0.0 G test of goodness-of-fit, and Pearson correction 7 DIVORCED WIDOWED 314.0 181.0 0.634343 0.5 36.105157 1.0 1.869514e-09 3.739029e-09 247.5 0.0 G test of goodness-of-fit, and Pearson correction 8 DIVORCED SEPARATED 314.0 79.0 0.798982 0.5 150.008211 1.0 1.726494e-34 8.632469e-34 196.5 0.0 G test of goodness-of-fit, and Pearson correction 9 WIDOWED SEPARATED 181.0 79.0 0.696154 0.5 40.952653 1.0 1.559621e-10 4.678864e-10 130.0 0.0 G test of goodness-of-fit, and Pearson correction ''' if type(data) is list: data = pd.Series(data) freq = data.value_counts() if expCount is None: #assume all to be equal n = sum(freq) k = len(freq) categories = list(freq.index) expC = [n/k] * k else: #check if categories match nE = 0 n = 0 for i in range(0, len(expCount)): nE = nE + expCount.iloc[i,1] n = n + freq[expCount.iloc[i,0]] expC = [] for i in range(0,len(expCount)): expC.append(expCount.iloc[i, 1]/nE*n) k = len(expC) categories = list(expCount.iloc[:,0]) n_pairs = int(k*(k-1)/2) results = pd.DataFrame() resRow=0 for i in range(0, k-1): for j in range(i+1, k): #category names results.at[resRow, 0] = categories[i] results.at[resRow, 1] = categories[j] #category sizes n1 = freq[categories[i]] n2 = freq[categories[j]] results.at[resRow, 2] = n1 results.at[resRow, 3] = n2 #data and expected counts expected_proportion_1 = expC[i]/n expected_proportion_2 = expC[j]/n exp_count_1 = (n1 + n2)*(expected_proportion_1*1/(expected_proportion_1+expected_proportion_2)) exp_count_2 = (n1 + n2)*(expected_proportion_2*1/(expected_proportion_1+expected_proportion_2)) exP = pd.DataFrame([[categories[i], exp_count_1], [categories[j], exp_count_2]], columns=['category', 'count']) results.at[resRow, 4] = n1/(n1 + n2) results.at[resRow, 5] = exp_count_1/(n1 + n2) pair = [categories[i], categories[j]] data_pair = data[data.isin(pair)] if test=="pearson": pair_test_result = ts_pearson_gof(data_pair, expCounts=exP, **kwargs) elif test=="freeman-tukey": pair_test_result = ts_freeman_tukey_gof(data_pair, expCounts=exP, **kwargs) elif test=="freeman-tukey-read": pair_test_result = ts_freeman_tukey_read(data_pair, expCounts=exP, **kwargs) elif test=="g": pair_test_result = ts_g_gof(data_pair, expCounts=exP, **kwargs) elif test=="mod-log-g": pair_test_result = ts_mod_log_likelihood_gof(data_pair, expCounts=exP, **kwargs) elif test=="neyman": pair_test_result = ts_neyman_gof(data_pair, expCounts=exP, **kwargs) elif test=="powerdivergence": pair_test_result = ts_powerdivergence_gof(data_pair, expCounts=exP, **kwargs) if test=="multinomial": pair_test_result = ts_multinomial_gof(data_pair, expCounts=exP, **kwargs) results.at[resRow, 6] = pair_test_result.iloc[0, 0] results.at[resRow, 7] = pair_test_result.iloc[0, 1] results.at[resRow, 8] = pair_test_result.iloc[0, 2] results.at[resRow, 9] = results.at[resRow, 8] results.at[resRow, 10] = pair_test_result.iloc[0, 3] else: results.at[resRow, 6] = pair_test_result.iloc[0, 2] results.at[resRow, 7] = pair_test_result.iloc[0, 3] results.at[resRow, 8] = pair_test_result.iloc[0, 4] results.at[resRow, 9] = results.at[resRow, 8] results.at[resRow, 10] = pair_test_result.iloc[0, 5] results.at[resRow, 11] = pair_test_result.iloc[0, 6] results.at[resRow, 12] = pair_test_result.iloc[0, 7] resRow = resRow + 1 results.iloc[:,9] = p_adjust(results.iloc[:,8], method=mtc) if test == "multinomial": # Set columns for multinomial case results.columns = [ "category 1", "category 2", "n1", "n2", "obs. prop. 1", "exp. prop. 1", "p obs", "n combs.", "p-value", "adj. p-value", "test" ] else: # Set columns for other cases results.columns = [ "category 1", "category 2", "n1", "n2", "obs. prop. 1", "exp. prop. 1", "statistic", "df", "p-value", "adj. p-value", "minExp", "percBelow5", "test" ] return results