Module `stikpetP.tests.test_fligner_policello`

Expand source code

import pandas as pd
from statistics import NormalDist
from ..other.table_cross import tab_cross

def ts_fligner_policello(catField, ordField, categories=None, levels=None, ties=True, cc=False):
    '''
    Fligner-Policello Test
    ----------------------
    An alternative for the more famous Mann-Whitney U test. The MWU test has as an assumption that the scores in the two categories have the same shape and have unequal variances (Fong & Huang, 2019). The Fligner-Policello test does not, although the distribution around their medians should be symmetric in the population Zaiontz (n.d.).
    
    Roughly put the assumption for this test is that the two categories have the same median in the population.

    This function is shown in this [YouTube video](https://youtu.be/pPn59yaOYmg) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/FlignerPolicello.html).
    
    Parameters
    ----------
    catField : pandas series
        data with categories for the rows
    ordField : pandas series
        data with the scores (ordinal field)
    categories : list or dictionary, optional
        the two categories to use from catField. If not set the first two found will be used
    levels : list or dictionary, optional
        the scores in order
    ties : boolean, optional
        to indicate the use of a ties correction. Default is False
    cc : boolean, optional
        use of continuity correction. Default is True.
    
    Returns
    -------
    A dataframe with:
    
    * *n*, the sample size
    * *statistic*, the z-value
    * *p-value*, the significance (p-value)
    * *test*, description of the test used
    
    Notes
    -----
    The formula used is:
    $$z = \\frac{N_Y - N_X}{2\\times\\sqrt{SS_X + SS_Y - M_X\\times M_Y}}$$
    
    With:
    $$SS_X = \\sum_{x\\in X} \\left(N_X - M_X\\right)^2, SS_Y = \\sum_{y\\in Y} \\left(N_Y - M_Y\\right)^2$$
    $$M_X = \\frac{N_X}{n_x}, M_Y = \\frac{N_Y}{n_y}$$
    $$N_X = \\sum_{x \\in X} N\\left(x\\right), N_Y = \\sum_{y \\in Y} N\\left(y\\right)$$
    $$N\\left(y\\right) = \\sum_{x\\in X} f\\left(y, x\\right)$$
    $$N\\left(x\\right) = \\sum_{y\\in Y} f\\left(x, y\\right)$$
    $$f\\left(a, b\\right) = \\begin{cases} 1 & \\text{ if } a> b \\\\ 0 & \\text{ if } a\\leq b \\end{cases}$$
    
    In case of a tie correction (Hollander et al., 2014, p. 146):
    $$z = \\frac{\\left|N_Y - N_X\\right| - 0.5}{2\\times\\sqrt{SS_X + SS_Y - M_X\\times M_Y}}$$
    $$f\\left(a, b\\right) = \\begin{cases} 1 & \\text{ if } a> b \\\\ 0.5 & \\text{ if } a = b \\\\ 0 & \\text{ if } a\\leq b \\end{cases}$$
    
    *Symbols used:*
    
    * \\(X\\) the scores in the first category
    * \\(Y\\) the scores in the second category
    * \\(n_i\\) the number of scores in the i category
    
    The test is described by Fligner and Policello (1981), and can also be found in Kloke and McKean (2015, p. 68)

    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a cross table and/or some visualisation:
    
    * [tab_cross](../other/table_cross.html#tab_cross)
    * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)

    After the test you might want an effect size measure:
    
    * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
    * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
    * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)

    Alternatives for this test could be:
    
    * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
    * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
    * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
    * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test
    
    References
    ----------
    Fligner, M. A., & Policello, G. E. (1981). Robust rank procedures for the Behrens-Fisher problem. *Journal of the American Statistical Association, 76*(373), 162–168. https://doi.org/10.1080/01621459.1981.10477623
    
    Fong, Y., & Huang, Y. (2019). Modified Wilcoxon–Mann–Whitney test and power against strong null. *The American Statistician, 73*(1), 43–49. https://doi.org/10.1080/00031305.2017.1328375
    
    Hollander, M., Wolfe, D. A., & Chicken, E. (2014). *Nonparametric statistical methods* (3rd ed.). John Wiley & Sons, Inc.
    
    Kloke, J., & McKean, J. W. (2015). *Nonparametric statistical methods using R*. CRC Press, Taylor & Francis.
    
    Zaiontz, C. (n.d.). Fligner-Policello test. Real Statistics Using Excel. Retrieved July 24, 2023, from https://real-statistics.com/non-parametric-tests/fligner-policello-test/

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4}
    >>> ts_fligner_policello(df1['sex'], df1['accntsci'], levels=myLevels)
         n  statistic   p-value                                          test
    0  954  -0.524521  0.599917  Fligner-Policello test, with ties correction
    
    >>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"]
    >>> ordinal = [4, 3, 1, 6, 5, 7, 2]
    >>> ts_fligner_policello(binary, ordinal)
       n  statistic   p-value                                          test
    0  7  -1.732051  0.083265  Fligner-Policello test, with ties correction
    
    
    '''
    ct  = tab_cross(ordField, catField, order1=levels, order2=categories, totals="exclude")
    nr = len(ct)
    
    fx = []
    fy = []
    if ties:
        fx.append(0.5*ct.iloc[0,1])
        fy.append(0.5*ct.iloc[0,0])
    else:
        fx.append(0)
        fy.append(0)
    
    nx = fx[0]*ct.iloc[0,0]
    ny = fy[0]*ct.iloc[0,1]
    
    n1 = ct.iloc[0,0]
    n2 = ct.iloc[0,1]
    
    for i in range(1, nr):
        fx.append(fx[i-1] + ct.iloc[i-1,1])
        fy.append(fy[i - 1] + ct.iloc[i - 1, 0])
        if (ties):
            fx[i] = fx[i] + 0.5 * ct.iloc[i, 1] - 0.5 * ct.iloc[i-1,1]
            fy[i] = fy[i] + 0.5 * ct.iloc[i, 0] - 0.5 * ct.iloc[i - 1, 0]
        
        nx = nx + fx[i] * ct.iloc[i, 0]
        n1 = n1 + ct.iloc[i, 0]
    
        ny = ny + fy[i] * ct.iloc[i, 1] 
        n2 = n2 + ct.iloc[i, 1]     
    
    MX = nx / n1
    MY = ny / n2
    
    ssx = ssy = 0
    for i in range(nr):
        ssx = ssx + ct.iloc[i, 0] * (fx[i] - MX)**2
        ssy = ssy + ct.iloc[i, 1]  * (fy[i] - MY)**2
        
    if cc:
        num = abs(nx - ny) - 0.5
    else:
        num = nx - ny
    
    z = num / (2 * (ssx + ssy + MX * MY)**0.5)
    
    pValue = 2 * (1 - NormalDist().cdf(abs(z))) 
    
    if (cc and ties):
        testUsed = "Fligner-Policello test, with continuity and ties correction"
    elif (cc):
        testUsed = "Fligner-Policello test, with continuity correction"
    elif (ties):
        testUsed = "Fligner-Policello test, with ties correction"
    else:
        testUsed = "Fligner-Policello test"
    
    #the results
    colNames = ["n", "statistic", "p-value", "test"]
    results = pd.DataFrame([[n1+n2, z, pValue, testUsed]], columns=colNames)
    
    return results

Functions

def ts_fligner_policello(catField, ordField, categories=None, levels=None, ties=True, cc=False)

Fligner-Policello Test

An alternative for the more famous Mann-Whitney U test. The MWU test has as an assumption that the scores in the two categories have the same shape and have unequal variances (Fong & Huang, 2019). The Fligner-Policello test does not, although the distribution around their medians should be symmetric in the population Zaiontz (n.d.).

Roughly put the assumption for this test is that the two categories have the same median in the population.

This function is shown in this YouTube video and the test is also described at PeterStatistics.com.

Parameters

catField : pandas series: data with categories for the rows
ordField : pandas series: data with the scores (ordinal field)
categories : list or dictionary, optional: the two categories to use from catField. If not set the first two found will be used
levels : list or dictionary, optional: the scores in order
ties : boolean, optional: to indicate the use of a ties correction. Default is False
cc : boolean, optional: use of continuity correction. Default is True.

Returns

A dataframe with:

n, the sample size
statistic, the z-value
p-value, the significance (p-value)
test, description of the test used

Notes

The formula used is: $z = \frac{N_Y - N_X}{2\times\sqrt{SS_X + SS_Y - M_X\times M_Y}}$

With: $SS_X = \sum_{x\in X} \left(N_X - M_X\right)^2, SS_Y = \sum_{y\in Y} \left(N_Y - M_Y\right)^2$ $M_X = \frac{N_X}{n_x}, M_Y = \frac{N_Y}{n_y}$ $N_X = \sum_{x \in X} N\left(x\right), N_Y = \sum_{y \in Y} N\left(y\right)$ $N\left(y\right) = \sum_{x\in X} f\left(y, x\right)$ $N\left(x\right) = \sum_{y\in Y} f\left(x, y\right)$ $f\left(a, b\right) = \begin{cases} 1 & \text{ if } a> b \\ 0 & \text{ if } a\leq b \end{cases}$

In case of a tie correction (Hollander et al., 2014, p. 146): $z = \frac{\left|N_Y - N_X\right| - 0.5}{2\times\sqrt{SS_X + SS_Y - M_X\times M_Y}}$ $f\left(a, b\right) = \begin{cases} 1 & \text{ if } a> b \\ 0.5 & \text{ if } a = b \\ 0 & \text{ if } a\leq b \end{cases}$

Symbols used:

$X$ the scores in the first category
$Y$ the scores in the second category
$n_i$ the number of scores in the i category

The test is described by Fligner and Policello (1981), and can also be found in Kloke and McKean (2015, p. 68)

Before, After and Alternatives

Before running the test you might first want to get an impression using a cross table and/or some visualisation:

After the test you might want an effect size measure:

es_common_language_is for Common Language Effect Size
me_hodges_lehmann_is for Hodges-Lehmann estimate
r_rank_biserial_is for (Glass) Rank Biserial (Cliff delta)

Alternatives for this test could be:

ts_mann_whitney for the Mann-Whitney U test
ts_brunner_munzel for the Brunner-Munzel test
ts_brunner_munzel_perm for the Brunner-Munzel Permutation test
ts_c_square for the $C^2$ test

References

Fligner, M. A., & Policello, G. E. (1981). Robust rank procedures for the Behrens-Fisher problem. Journal of the American Statistical Association, 76(373), 162–168. https://doi.org/10.1080/01621459.1981.10477623

Fong, Y., & Huang, Y. (2019). Modified Wilcoxon–Mann–Whitney test and power against strong null. The American Statistician, 73(1), 43–49. https://doi.org/10.1080/00031305.2017.1328375

Hollander, M., Wolfe, D. A., & Chicken, E. (2014). Nonparametric statistical methods (3rd ed.). John Wiley & Sons, Inc.

Kloke, J., & McKean, J. W. (2015). Nonparametric statistical methods using R. CRC Press, Taylor & Francis.

Zaiontz, C. (n.d.). Fligner-Policello test. Real Statistics Using Excel. Retrieved July 24, 2023, from https://real-statistics.com/non-parametric-tests/fligner-policello-test/

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4}
>>> ts_fligner_policello(df1['sex'], df1['accntsci'], levels=myLevels)
     n  statistic   p-value                                          test
0  954  -0.524521  0.599917  Fligner-Policello test, with ties correction

>>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"]
>>> ordinal = [4, 3, 1, 6, 5, 7, 2]
>>> ts_fligner_policello(binary, ordinal)
   n  statistic   p-value                                          test
0  7  -1.732051  0.083265  Fligner-Policello test, with ties correction

Expand source code

def ts_fligner_policello(catField, ordField, categories=None, levels=None, ties=True, cc=False):
    '''
    Fligner-Policello Test
    ----------------------
    An alternative for the more famous Mann-Whitney U test. The MWU test has as an assumption that the scores in the two categories have the same shape and have unequal variances (Fong & Huang, 2019). The Fligner-Policello test does not, although the distribution around their medians should be symmetric in the population Zaiontz (n.d.).
    
    Roughly put the assumption for this test is that the two categories have the same median in the population.

    This function is shown in this [YouTube video](https://youtu.be/pPn59yaOYmg) and the test is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Tests/FlignerPolicello.html).
    
    Parameters
    ----------
    catField : pandas series
        data with categories for the rows
    ordField : pandas series
        data with the scores (ordinal field)
    categories : list or dictionary, optional
        the two categories to use from catField. If not set the first two found will be used
    levels : list or dictionary, optional
        the scores in order
    ties : boolean, optional
        to indicate the use of a ties correction. Default is False
    cc : boolean, optional
        use of continuity correction. Default is True.
    
    Returns
    -------
    A dataframe with:
    
    * *n*, the sample size
    * *statistic*, the z-value
    * *p-value*, the significance (p-value)
    * *test*, description of the test used
    
    Notes
    -----
    The formula used is:
    $$z = \\frac{N_Y - N_X}{2\\times\\sqrt{SS_X + SS_Y - M_X\\times M_Y}}$$
    
    With:
    $$SS_X = \\sum_{x\\in X} \\left(N_X - M_X\\right)^2, SS_Y = \\sum_{y\\in Y} \\left(N_Y - M_Y\\right)^2$$
    $$M_X = \\frac{N_X}{n_x}, M_Y = \\frac{N_Y}{n_y}$$
    $$N_X = \\sum_{x \\in X} N\\left(x\\right), N_Y = \\sum_{y \\in Y} N\\left(y\\right)$$
    $$N\\left(y\\right) = \\sum_{x\\in X} f\\left(y, x\\right)$$
    $$N\\left(x\\right) = \\sum_{y\\in Y} f\\left(x, y\\right)$$
    $$f\\left(a, b\\right) = \\begin{cases} 1 & \\text{ if } a> b \\\\ 0 & \\text{ if } a\\leq b \\end{cases}$$
    
    In case of a tie correction (Hollander et al., 2014, p. 146):
    $$z = \\frac{\\left|N_Y - N_X\\right| - 0.5}{2\\times\\sqrt{SS_X + SS_Y - M_X\\times M_Y}}$$
    $$f\\left(a, b\\right) = \\begin{cases} 1 & \\text{ if } a> b \\\\ 0.5 & \\text{ if } a = b \\\\ 0 & \\text{ if } a\\leq b \\end{cases}$$
    
    *Symbols used:*
    
    * \\(X\\) the scores in the first category
    * \\(Y\\) the scores in the second category
    * \\(n_i\\) the number of scores in the i category
    
    The test is described by Fligner and Policello (1981), and can also be found in Kloke and McKean (2015, p. 68)

    Before, After and Alternatives
    ------------------------------
    Before running the test you might first want to get an impression using a cross table and/or some visualisation:
    
    * [tab_cross](../other/table_cross.html#tab_cross)
    * [vi_bar_stacked_multiple](../visualisations/vis_bar_stacked_multiple.html#vi_bar_stacked_multiple)

    After the test you might want an effect size measure:
    
    * [es_common_language_is](../effect_sizes/eff_size_common_language_is.html#es_common_language_is) for Common Language Effect Size
    * [me_hodges_lehmann_is](../measures/meas_hodges_lehmann_is.html#me_hodges_lehmann_is) for Hodges-Lehmann estimate
    * [r_rank_biserial_is](../correlations/cor_rank_biserial_is.html#r_rank_biserial_is) for (Glass) Rank Biserial (Cliff delta)

    Alternatives for this test could be:
    
    * [ts_mann_whitney](../tests/test_mann_whitney.html#ts_mann_whitney) for the Mann-Whitney U test
    * [ts_brunner_munzel](../tests/test_brunner_munzel.html#ts_brunner_munzel) for the Brunner-Munzel test
    * [ts_brunner_munzel_perm](../tests/test_brunner_munzel.html#ts_brunner_munzel_perm) for the Brunner-Munzel Permutation test
    * [ts_c_square](../tests/test_c_square.html#ts_c_square) for the \\(C^2\\) test
    
    References
    ----------
    Fligner, M. A., & Policello, G. E. (1981). Robust rank procedures for the Behrens-Fisher problem. *Journal of the American Statistical Association, 76*(373), 162–168. https://doi.org/10.1080/01621459.1981.10477623
    
    Fong, Y., & Huang, Y. (2019). Modified Wilcoxon–Mann–Whitney test and power against strong null. *The American Statistician, 73*(1), 43–49. https://doi.org/10.1080/00031305.2017.1328375
    
    Hollander, M., Wolfe, D. A., & Chicken, E. (2014). *Nonparametric statistical methods* (3rd ed.). John Wiley & Sons, Inc.
    
    Kloke, J., & McKean, J. W. (2015). *Nonparametric statistical methods using R*. CRC Press, Taylor & Francis.
    
    Zaiontz, C. (n.d.). Fligner-Policello test. Real Statistics Using Excel. Retrieved July 24, 2023, from https://real-statistics.com/non-parametric-tests/fligner-policello-test/

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> myLevels = {'Not scientific at all': 1, 'Not too scientific': 2, 'Pretty scientific': 3, 'Very scientific': 4}
    >>> ts_fligner_policello(df1['sex'], df1['accntsci'], levels=myLevels)
         n  statistic   p-value                                          test
    0  954  -0.524521  0.599917  Fligner-Policello test, with ties correction
    
    >>> binary = ["apple", "apple", "apple", "peer", "peer", "peer", "peer"]
    >>> ordinal = [4, 3, 1, 6, 5, 7, 2]
    >>> ts_fligner_policello(binary, ordinal)
       n  statistic   p-value                                          test
    0  7  -1.732051  0.083265  Fligner-Policello test, with ties correction
    
    
    '''
    ct  = tab_cross(ordField, catField, order1=levels, order2=categories, totals="exclude")
    nr = len(ct)
    
    fx = []
    fy = []
    if ties:
        fx.append(0.5*ct.iloc[0,1])
        fy.append(0.5*ct.iloc[0,0])
    else:
        fx.append(0)
        fy.append(0)
    
    nx = fx[0]*ct.iloc[0,0]
    ny = fy[0]*ct.iloc[0,1]
    
    n1 = ct.iloc[0,0]
    n2 = ct.iloc[0,1]
    
    for i in range(1, nr):
        fx.append(fx[i-1] + ct.iloc[i-1,1])
        fy.append(fy[i - 1] + ct.iloc[i - 1, 0])
        if (ties):
            fx[i] = fx[i] + 0.5 * ct.iloc[i, 1] - 0.5 * ct.iloc[i-1,1]
            fy[i] = fy[i] + 0.5 * ct.iloc[i, 0] - 0.5 * ct.iloc[i - 1, 0]
        
        nx = nx + fx[i] * ct.iloc[i, 0]
        n1 = n1 + ct.iloc[i, 0]
    
        ny = ny + fy[i] * ct.iloc[i, 1] 
        n2 = n2 + ct.iloc[i, 1]     
    
    MX = nx / n1
    MY = ny / n2
    
    ssx = ssy = 0
    for i in range(nr):
        ssx = ssx + ct.iloc[i, 0] * (fx[i] - MX)**2
        ssy = ssy + ct.iloc[i, 1]  * (fy[i] - MY)**2
        
    if cc:
        num = abs(nx - ny) - 0.5
    else:
        num = nx - ny
    
    z = num / (2 * (ssx + ssy + MX * MY)**0.5)
    
    pValue = 2 * (1 - NormalDist().cdf(abs(z))) 
    
    if (cc and ties):
        testUsed = "Fligner-Policello test, with continuity and ties correction"
    elif (cc):
        testUsed = "Fligner-Policello test, with continuity correction"
    elif (ties):
        testUsed = "Fligner-Policello test, with ties correction"
    else:
        testUsed = "Fligner-Policello test"
    
    #the results
    colNames = ["n", "statistic", "p-value", "test"]
    results = pd.DataFrame([[n1+n2, z, pValue, testUsed]], columns=colNames)
    
    return results