Module stikpetP.tests.test_fligner_policello
Expand source code
import pandas as pd
from statistics import NormalDist
from ..other.table_cross import tab_cross
def ts_fligner_policello(catField, ordField, categories=None, levels=None, ties=True, cc=False):
'''
Fligner-Policello Test
----------------------
An alternative for the more famous Mann-Whitney U test. The MWU test has as an assumption that the scores in the two categories have the same shape and have unequal variances (Fong & Huang, 2019). The Fligner-Policello test does not, although the distribution around their medians should be symmetric in the population Zaiontz (n.d.).
Roughly put the assumption for this test is that the two categories have the same median in the population.
This function is shown in this [YouTube video](https://youtu.be/pPn59yaOYmg) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/FlignerPolicello.html).
Parameters
----------
catField : pandas series
data with categories for the rows
ordField : pandas series
data with the scores (ordinal field)
categories : list or dictionary, optional
the two categories to use from catField. If not set the first two found will be used
levels : list or dictionary, optional
the scores in order
ties : boolean, optional
to indicate the use of a ties correction. Default is False
cc : boolean, optional
use of continuity correction. Default is True.
Returns
-------
A dataframe with:
* *n*, the sample size
* *statistic*, the z-value
* *p-value*, the significance (p-value)
* *test*, description of the test used
Notes
-----
The formula used is:
$$z = \\frac{N_Y - N_X}{2\\times\\sqrt{SS_X + SS_Y - M_X\\times M_Y}}$$
With:
$$SS_X = \\sum_{x\\in X} \\left(N_X - M_X\\right)^2, SS_Y = \\sum_{y\\in Y} \\left(N_Y - M_Y\\right)^2$$
$$M_X = \\frac{N_X}{n_x}, M_Y = \\frac{N_Y}{n_y}$$
$$N_X = \\sum_{x \\in X} N\\left(x\\right), N_Y = \\sum_{y \\in Y} N\\left(y\\right)$$
$$N\\left(y\\right) = \\sum_{x\\in X} f\\left(y, x\\right)$$
$$N\\left(x\\right) = \\sum_{y\\in Y} f\\left(x, y\\right)$$
$$f\\left(a, b\\right) = \\begin{cases} 1 & \\text{ if } a> b \\\\ 0 & \\text{ if } a\\leq b \\end{cases}$$
In case of a tie correction (Hollander et al., 2014, p. 146):
$$z = \\frac{\\left|N_Y - N_X\\right| - 0.5}{2\\times\\sqrt{SS_X + SS_Y - M_X\\times M_Y}}$$
$$f\\left(a, b\\right) = \\begin{cases} 1 & \\text{ if } a> b \\\\ 0.5 & \\text{ if } a = b \\\\ 0 & \\text{ if } a\\leq b \\end{cases}$$
*Symbols used:*
* \\(X\\) the scores in the first category
* \\(Y\\) the scores in the second category
* \\(n_i\\) the number of scores in the i category
The test is described by Fligner and Policello (1981), and can also be found in Kloke and McKean (2015, p. 68)
Before, After and Alternatives
------------------------------
Before running the test you might first want to get an impression using a cross table and/or some visualisation:
* [tab_cross](../other/table_cross.html#tab_cross)
* [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)
After the test you might want an effect size measure:
* [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
* [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
* [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be:
* [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
* [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
* [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
* [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test
References
----------
Fligner, M. A., & Policello, G. E. (1981). Robust rank procedures for the Behrens-Fisher problem. *Journal of the American Statistical Association, 76*(373), 162–168. https://doi.org/10.1080/01621459.1981.10477623
Fong, Y., & Huang, Y. (2019). Modified Wilcoxon–Mann–Whitney test and power against strong null. *The American Statistician, 73*(1), 43–49. https://doi.org/10.1080/00031305.2017.1328375
Hollander, M., Wolfe, D. A., & Chicken, E. (2014). *Nonparametric statistical methods* (3rd ed.). John Wiley & Sons, Inc.
Kloke, J., & McKean, J. W. (2015). *Nonparametric statistical methods using R*. CRC Press, Taylor & Francis.
Zaiontz, C. (n.d.). Fligner-Policello test. Real Statistics Using Excel. Retrieved July 24, 2023, from https://real-statistics.com/non-parametric-tests/fligner-policello-test/
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4}
>>> ts_fligner_policello(df1['sex'], df1['accntsci'], levels=myLevels)
n statistic p-value test
0 954 -0.524521 0.599917 Fligner-Policello test, with ties correction
>>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"]
>>> ordinal = [4, 3, 1, 6, 5, 7, 2]
>>> ts_fligner_policello(binary, ordinal)
n statistic p-value test
0 7 -1.732051 0.083265 Fligner-Policello test, with ties correction
'''
ct = tab_cross(ordField, catField, order1=levels, order2=categories, totals="exclude")
nr = len(ct)
fx = []
fy = []
if ties:
fx.append(0.5*ct.iloc[0,1])
fy.append(0.5*ct.iloc[0,0])
else:
fx.append(0)
fy.append(0)
nx = fx[0]*ct.iloc[0,0]
ny = fy[0]*ct.iloc[0,1]
n1 = ct.iloc[0,0]
n2 = ct.iloc[0,1]
for i in range(1, nr):
fx.append(fx[i-1] + ct.iloc[i-1,1])
fy.append(fy[i - 1] + ct.iloc[i - 1, 0])
if (ties):
fx[i] = fx[i] + 0.5 * ct.iloc[i, 1] - 0.5 * ct.iloc[i-1,1]
fy[i] = fy[i] + 0.5 * ct.iloc[i, 0] - 0.5 * ct.iloc[i - 1, 0]
nx = nx + fx[i] * ct.iloc[i, 0]
n1 = n1 + ct.iloc[i, 0]
ny = ny + fy[i] * ct.iloc[i, 1]
n2 = n2 + ct.iloc[i, 1]
MX = nx / n1
MY = ny / n2
ssx = ssy = 0
for i in range(nr):
ssx = ssx + ct.iloc[i, 0] * (fx[i] - MX)**2
ssy = ssy + ct.iloc[i, 1] * (fy[i] - MY)**2
if cc:
num = abs(nx - ny) - 0.5
else:
num = nx - ny
z = num / (2 * (ssx + ssy + MX * MY)**0.5)
pValue = 2 * (1 - NormalDist().cdf(abs(z)))
if (cc and ties):
testUsed = "Fligner-Policello test, with continuity and ties correction"
elif (cc):
testUsed = "Fligner-Policello test, with continuity correction"
elif (ties):
testUsed = "Fligner-Policello test, with ties correction"
else:
testUsed = "Fligner-Policello test"
#the results
colNames = ["n", "statistic", "p-value", "test"]
results = pd.DataFrame([[n1+n2, z, pValue, testUsed]], columns=colNames)
return results
Functions
def ts_fligner_policello(catField, ordField, categories=None, levels=None, ties=True, cc=False)-
Fligner-Policello Test
An alternative for the more famous Mann-Whitney U test. The MWU test has as an assumption that the scores in the two categories have the same shape and have unequal variances (Fong & Huang, 2019). The Fligner-Policello test does not, although the distribution around their medians should be symmetric in the population Zaiontz (n.d.).
Roughly put the assumption for this test is that the two categories have the same median in the population.
This function is shown in this YouTube video and the test is also described at PeterStatistics.com.
Parameters
catField:pandas series- data with categories for the rows
ordField:pandas series- data with the scores (ordinal field)
categories:listordictionary, optional- the two categories to use from catField. If not set the first two found will be used
levels:listordictionary, optional- the scores in order
ties:boolean, optional- to indicate the use of a ties correction. Default is False
cc:boolean, optional- use of continuity correction. Default is True.
Returns
A dataframe with:
- n, the sample size
- statistic, the z-value
- p-value, the significance (p-value)
- test, description of the test used
Notes
The formula used is: z = \frac{N_Y - N_X}{2\times\sqrt{SS_X + SS_Y - M_X\times M_Y}}
With: SS_X = \sum_{x\in X} \left(N_X - M_X\right)^2, SS_Y = \sum_{y\in Y} \left(N_Y - M_Y\right)^2 M_X = \frac{N_X}{n_x}, M_Y = \frac{N_Y}{n_y} N_X = \sum_{x \in X} N\left(x\right), N_Y = \sum_{y \in Y} N\left(y\right) N\left(y\right) = \sum_{x\in X} f\left(y, x\right) N\left(x\right) = \sum_{y\in Y} f\left(x, y\right) f\left(a, b\right) = \begin{cases} 1 & \text{ if } a> b \\ 0 & \text{ if } a\leq b \end{cases}
In case of a tie correction (Hollander et al., 2014, p. 146): z = \frac{\left|N_Y - N_X\right| - 0.5}{2\times\sqrt{SS_X + SS_Y - M_X\times M_Y}} f\left(a, b\right) = \begin{cases} 1 & \text{ if } a> b \\ 0.5 & \text{ if } a = b \\ 0 & \text{ if } a\leq b \end{cases}
Symbols used:
- X the scores in the first category
- Y the scores in the second category
- n_i the number of scores in the i category
The test is described by Fligner and Policello (1981), and can also be found in Kloke and McKean (2015, p. 68)
Before, After and Alternatives
Before running the test you might first want to get an impression using a cross table and/or some visualisation:
After the test you might want an effect size measure:
- es_common_language_is for Common Language Effect Size
- me_hodges_lehmann_is for Hodges-Lehmann estimate
- r_rank_biserial_is for (Glass) Rank Biserial (Cliff delta)
Alternatives for this test could be:
- ts_mann_whitney for the Mann-Whitney U test
- ts_brunner_munzel for the Brunner-Munzel test
- ts_brunner_munzel_perm for the Brunner-Munzel Permutation test
- ts_c_square for the C^2 test
References
Fligner, M. A., & Policello, G. E. (1981). Robust rank procedures for the Behrens-Fisher problem. Journal of the American Statistical Association, 76(373), 162–168. https://doi.org/10.1080/01621459.1981.10477623
Fong, Y., & Huang, Y. (2019). Modified Wilcoxon–Mann–Whitney test and power against strong null. The American Statistician, 73(1), 43–49. https://doi.org/10.1080/00031305.2017.1328375
Hollander, M., Wolfe, D. A., & Chicken, E. (2014). Nonparametric statistical methods (3rd ed.). John Wiley & Sons, Inc.
Kloke, J., & McKean, J. W. (2015). Nonparametric statistical methods using R. CRC Press, Taylor & Francis.
Zaiontz, C. (n.d.). Fligner-Policello test. Real Statistics Using Excel. Retrieved July 24, 2023, from https://real-statistics.com/non-parametric-tests/fligner-policello-test/
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4} >>> ts_fligner_policello(df1['sex'], df1['accntsci'], levels=myLevels) n statistic p-value test 0 954 -0.524521 0.599917 Fligner-Policello test, with ties correction>>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"] >>> ordinal = [4, 3, 1, 6, 5, 7, 2] >>> ts_fligner_policello(binary, ordinal) n statistic p-value test 0 7 -1.732051 0.083265 Fligner-Policello test, with ties correctionExpand source code
def ts_fligner_policello(catField, ordField, categories=None, levels=None, ties=True, cc=False): ''' Fligner-Policello Test ---------------------- An alternative for the more famous Mann-Whitney U test. The MWU test has as an assumption that the scores in the two categories have the same shape and have unequal variances (Fong & Huang, 2019). The Fligner-Policello test does not, although the distribution around their medians should be symmetric in the population Zaiontz (n.d.). Roughly put the assumption for this test is that the two categories have the same median in the population. This function is shown in this [YouTube video](https://youtu.be/pPn59yaOYmg) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/FlignerPolicello.html). Parameters ---------- catField : pandas series data with categories for the rows ordField : pandas series data with the scores (ordinal field) categories : list or dictionary, optional the two categories to use from catField. If not set the first two found will be used levels : list or dictionary, optional the scores in order ties : boolean, optional to indicate the use of a ties correction. Default is False cc : boolean, optional use of continuity correction. Default is True. Returns ------- A dataframe with: * *n*, the sample size * *statistic*, the z-value * *p-value*, the significance (p-value) * *test*, description of the test used Notes ----- The formula used is: $$z = \\frac{N_Y - N_X}{2\\times\\sqrt{SS_X + SS_Y - M_X\\times M_Y}}$$ With: $$SS_X = \\sum_{x\\in X} \\left(N_X - M_X\\right)^2, SS_Y = \\sum_{y\\in Y} \\left(N_Y - M_Y\\right)^2$$ $$M_X = \\frac{N_X}{n_x}, M_Y = \\frac{N_Y}{n_y}$$ $$N_X = \\sum_{x \\in X} N\\left(x\\right), N_Y = \\sum_{y \\in Y} N\\left(y\\right)$$ $$N\\left(y\\right) = \\sum_{x\\in X} f\\left(y, x\\right)$$ $$N\\left(x\\right) = \\sum_{y\\in Y} f\\left(x, y\\right)$$ $$f\\left(a, b\\right) = \\begin{cases} 1 & \\text{ if } a> b \\\\ 0 & \\text{ if } a\\leq b \\end{cases}$$ In case of a tie correction (Hollander et al., 2014, p. 146): $$z = \\frac{\\left|N_Y - N_X\\right| - 0.5}{2\\times\\sqrt{SS_X + SS_Y - M_X\\times M_Y}}$$ $$f\\left(a, b\\right) = \\begin{cases} 1 & \\text{ if } a> b \\\\ 0.5 & \\text{ if } a = b \\\\ 0 & \\text{ if } a\\leq b \\end{cases}$$ *Symbols used:* * \\(X\\) the scores in the first category * \\(Y\\) the scores in the second category * \\(n_i\\) the number of scores in the i category The test is described by Fligner and Policello (1981), and can also be found in Kloke and McKean (2015, p. 68) Before, After and Alternatives ------------------------------ Before running the test you might first want to get an impression using a cross table and/or some visualisation: * [tab_cross](../other/table_cross.html#tab_cross) * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple) After the test you might want an effect size measure: * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta) Alternatives for this test could be: * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test References ---------- Fligner, M. A., & Policello, G. E. (1981). Robust rank procedures for the Behrens-Fisher problem. *Journal of the American Statistical Association, 76*(373), 162–168. https://doi.org/10.1080/01621459.1981.10477623 Fong, Y., & Huang, Y. (2019). Modified Wilcoxon–Mann–Whitney test and power against strong null. *The American Statistician, 73*(1), 43–49. https://doi.org/10.1080/00031305.2017.1328375 Hollander, M., Wolfe, D. A., & Chicken, E. (2014). *Nonparametric statistical methods* (3rd ed.). John Wiley & Sons, Inc. Kloke, J., & McKean, J. W. (2015). *Nonparametric statistical methods using R*. CRC Press, Taylor & Francis. Zaiontz, C. (n.d.). Fligner-Policello test. Real Statistics Using Excel. Retrieved July 24, 2023, from https://real-statistics.com/non-parametric-tests/fligner-policello-test/ Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4} >>> ts_fligner_policello(df1['sex'], df1['accntsci'], levels=myLevels) n statistic p-value test 0 954 -0.524521 0.599917 Fligner-Policello test, with ties correction >>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"] >>> ordinal = [4, 3, 1, 6, 5, 7, 2] >>> ts_fligner_policello(binary, ordinal) n statistic p-value test 0 7 -1.732051 0.083265 Fligner-Policello test, with ties correction ''' ct = tab_cross(ordField, catField, order1=levels, order2=categories, totals="exclude") nr = len(ct) fx = [] fy = [] if ties: fx.append(0.5*ct.iloc[0,1]) fy.append(0.5*ct.iloc[0,0]) else: fx.append(0) fy.append(0) nx = fx[0]*ct.iloc[0,0] ny = fy[0]*ct.iloc[0,1] n1 = ct.iloc[0,0] n2 = ct.iloc[0,1] for i in range(1, nr): fx.append(fx[i-1] + ct.iloc[i-1,1]) fy.append(fy[i - 1] + ct.iloc[i - 1, 0]) if (ties): fx[i] = fx[i] + 0.5 * ct.iloc[i, 1] - 0.5 * ct.iloc[i-1,1] fy[i] = fy[i] + 0.5 * ct.iloc[i, 0] - 0.5 * ct.iloc[i - 1, 0] nx = nx + fx[i] * ct.iloc[i, 0] n1 = n1 + ct.iloc[i, 0] ny = ny + fy[i] * ct.iloc[i, 1] n2 = n2 + ct.iloc[i, 1] MX = nx / n1 MY = ny / n2 ssx = ssy = 0 for i in range(nr): ssx = ssx + ct.iloc[i, 0] * (fx[i] - MX)**2 ssy = ssy + ct.iloc[i, 1] * (fy[i] - MY)**2 if cc: num = abs(nx - ny) - 0.5 else: num = nx - ny z = num / (2 * (ssx + ssy + MX * MY)**0.5) pValue = 2 * (1 - NormalDist().cdf(abs(z))) if (cc and ties): testUsed = "Fligner-Policello test, with continuity and ties correction" elif (cc): testUsed = "Fligner-Policello test, with continuity correction" elif (ties): testUsed = "Fligner-Policello test, with ties correction" else: testUsed = "Fligner-Policello test" #the results colNames = ["n", "statistic", "p-value", "test"] results = pd.DataFrame([[n1+n2, z, pValue, testUsed]], columns=colNames) return results