Module stikpetP.measures.meas_quantiles

Expand source code
import pandas as pd
import math
from ..helper.help_quantileIndex import he_quantileIndex

def me_quantiles(data, levels=None, k=4, method="own", indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int"):
    '''
    Quantiles
    ---------
    
    Quantiles split the data into k sections, each containing n/k scores. They can be seen as a generalisation of various 'tiles'. For example 4-quantiles is the same as the quartiles, 5-quantiles the same as quintiles, 100-quantiles the same as percentiles, etc.
    
    Quite a few different methods exist to determine these. See the notes for more information.

    This function is shown in this [YouTube video](https://youtu.be/iI07nJ3wlOQ) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/Quantiles.html)
    
    Parameters
    ----------
    data : list or pandas series
    levels : dictionary, optional 
        coding to use
    k : number of quantiles
    method : string, optional 
        which method to use to calculate quantiles
    indexMethod : {"sas1", "sas4", "excel", "hl", "hf8", "hf9"}, optional 
        to indicate which type of indexing to use. Default is "sas1"
    qLfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional 
        to indicate what type of rounding to use for quantiles below 50 percent. Default is "linear"
    qLint : {"int", "midpoint"}, optional 
        to indicate the use of the integer or the midpoint method for first quarter. Default is "int"
    qHfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional 
        to indicate what type of rounding to use for quantiles equal or above 50 percent. Default is "linear"
    qHint : {"int", "midpoint"}, optional  
        to indicate the use of the integer or the midpoint method for quantiles equal or above 50 percent. Default is "int"
    
    method can be set to "own" and then provide the next parameters, or any of the methods listed in the notes.
    
    Returns
    -------
    results : the quantiles, or if levels are used also additionally text versions
    
    Notes
    -----
    To determine the quartiles a specific indexing method can be used. See **he_quantileIndexing()** for details on the different methods to choose from.
    
    Then based on the indexes either linear interpolation or different rounding methods (bankers, nearest, down, up, half-down) can be used, or the midpoint between the two values. If the index is an integer either the integer or the mid point is used. 
    
    See the **he_quantilesIndex()** for details on this.
    
    Note that the rounding method can even vary per quantile, i.e. the one used for the ones below the median being different than the one those equal or above.

    I've come across the following methods:

    |method|indexing|q1 integer|q1 fractional|q3 integer|q3 fractional|
    |------|--------|----------|-------------|----------|-------------|
    |sas1|sas1|use int|linear|use int|linear|
    |sas2|sas1|use int|bankers|use int|bankers|
    |sas3|sas1|use int|up|use int|up|
    |sas5|sas1|midpoint|up|midpoint|up|
    |hf3b|sas1|use int|nearest|use int|halfdown|
    |sas4|sas4|use int|linear|use int|linear|
    |ms|sas4|use int|nearest|use int|halfdown|
    |lohninger|sas4|use int|nearest|use int|nearest|
    |hl2|hl|use int|linear|use int|linear|
    |hl1|hl|use int|midpoint|use int|midpoint|
    |excel|excel|use int|linear|use int|linear|
    |pd2|excel|use int|down|use int|down|
    |pd3|excel|use int|up|use int|up|
    |pd4|excel|use int|halfdown|use int|nearest|
    |pd5|excel|use int|midpoint|use int|midpoint|
    |hf8|hf8|use int|linear|use int|linear|
    |hf9|hf9|use int|linear|use int|linear|

    The following values can be used for the *method* parameter:

    1. sas1 = parzen = hf4 = interpolated_inverted_cdf = maple3 = r4. (Parzen, 1979, p. 108; SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 363)
    1. sas2 = hf3 = r3. (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. sas3 = hf1 = inverted_cdf = maple1 = r1 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. sas4 = hf6 = minitab = snedecor = weibull = maple5 = r6 (Hyndman & Fan, 1996, p. 363; Weibull, 1939, p. ?; Snedecor, 1940, p. 43; SAS, 1990, p. 626)
    1. sas5 = hf2 = CDF = averaged_inverted_cdf = r2 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. hf3b = closest_observation 
    1. ms (Mendenhall & Sincich, 1992, p. 35)
    1. lohninger (Lohninger, n.d.)
    1. hl1 (Hogg & Ledolter, 1992, p. 21)
    1. hl2 = hf5 = Hazen = maple4 = r5 (Hogg & Ledolter, 1992, p. 21; Hazen, 1914, p. ?)
    1. maple2
    1. excel = hf7 = pd1 = linear = gumbel = maple6 = r7 (Hyndman & Fan, 1996, p. 363; Freund & Perles, 1987, p. 201; Gumbel, 1939, p. ?)
    1. pd2 = lower
    1. pd3 = higher
    1. pd4 = nearest
    1. pd5 = midpoint
    1. hf8 = median_unbiased = maple7 = r8 (Hyndman & Fan, 1996, p. 363)
    1. hf9 = normal_unbiased = maple8 = r9 (Hyndman & Fan, 1996, p. 363)

    *hf* is short for Hyndman and Fan who wrote an article showcasing many different methods, *hl* is short for Hog and Ledolter, *ms* is short for Mendenhall and Sincich, *jf* is short for Joarder and Firozzaman. *sas* refers to the software package SAS, *maple* to Maple, *pd* to Python's pandas library, and *r* to R.
    
    The names *linear*, *lower*, *higher*, *nearest* and *midpoint* are all used by pandas quantile function and numpy percentile function. Numpy also uses *inverted_cdf*, *averaged_inverted_cdf*, *closest_observation*, *interpolated_inverted_cdf*, *hazen*, *weibull*, *median_unbiased*, and *normal_unbiased*. 

    Before, After and Alternatives
    ------------------------------
    Before this measure you might want an impression using a frequency table or a visualisation:
    * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
    * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
    * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart

    After this you might want some other descriptive measures:
    * [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus
    * [me_hodges_lehmann_os](../measures/meas_hodges_lehmann_os.html#me_hodges_lehmann_os) for the Hodges-Lehmann Estimate (One-Sample)
    * [me_median](../measures/meas_median.html#me_median) for the Median
    * [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges
    * [me_quartile_range](../measures/meas_quartile_range.html#me_quartile_range) for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
    
    or perform a test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)

    For more information on the quartile indexing methods and index itself:
    * [he_quantileIndexing](../helper/help_quantileIndexing.html#he_quartileIndexing)
    * [he_quantilesIndex](../helper/help_quantileIndex.html#he_quartilesIndex)
    
    References
    ----------
    Freund, J. E., & Perles, B. M. (1987). A new look at quartiles of ungrouped data. *The American Statistician, 41*(3), 200–203. doi:10.1080/00031305.1987.10475479

    Galton, F. (1881). Report of the anthropometric committee. *Report of the British Association for the Advancement of Science, 51*, 225–272.

    Gumbel, E. J. (1939). La Probabilité des Hypothèses. *Compes Rendus de l’ Académie des Sciences, 209*, 645–647.

    Hazen, A. (1914). Storage to be provided in impounding municipal water supply. *Transactions of the American Society of Civil Engineers, 77*(1), 1539–1640. doi:10.1061/taceat.0002563

    Hogg, R. V., & Ledolter, J. (1992). *Applied statistics for engineers and physical scientists* (2nd int.). Macmillan.

    Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. *The American Statistician, 50*(4), 361–365. doi:10.2307/2684934

    Langford, E. (2006). Quartiles in elementary statistics. *Journal of Statistics Education, 14*(3), 1–17. doi:10.1080/10691898.2006.11910589

    Lohninger, H. (n.d.). Quartile. Fundamentals of Statistics. Retrieved April 7, 2023, from http://www.statistics4u.com/fundstat_eng/cc_quartile.html

    McAlister, D. (1879). The law of the geometric mean. *Proceedings of the Royal Society of London, 29*(196–199), 367–376. doi:10.1098/rspl.1879.0061

    Mendenhall, W., & Sincich, T. (1992). *Statistics for engineering and the sciences* (3rd ed.). Dellen Publishing Company.

    Parzen, E. (1979). Nonparametric statistical data modeling. *Journal of the American Statistical Association, 74*(365), 105–121. doi:10.1080/01621459.1979.10481621

    SAS. (1990). SAS procedures guide: Version 6 (3rd ed.). SAS Institute.

    Siegel, A. F., & Morgan, C. J. (1996). *Statistics and data analysis: An introduction* (2nd ed.). J. Wiley.

    Snedecor, G. W. (1940). *Statistical methods applied to experiments in agriculture and biology* (3rd ed.). The Iowa State College Press.

    Vining, G. G. (1998). *Statistical methods for engineers*. Duxbury Press.

    Weibull, W. (1939).* The phenomenon of rupture in solids*. Ingeniörs Vetenskaps Akademien, 153, 1–55.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076


    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = student_df['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> me_quantiles(ex1, levels=order)
    (0    1.0
    1    1.0
    2    2.0
    3    3.0
    4    5.0
    dtype: float64, ['Fully Disagree', 'Fully Disagree', 'Disagree', 'Neither disagree nor agree', 'Fully agree'])
    
    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> me_quantiles(ex2)
    0    1.0
    1    2.0
    2    4.0
    3    5.0
    4    5.0
    dtype: float64
    
    Example 3: Text data
    >>> ex3 = ["a", "b", "f", "d", "e", "c"]
    >>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
    >>> me_quantiles(ex3, levels=order)
    (0    1.0
    1    1.5
    2    3.0
    3    4.5
    4    6.0
    dtype: float64, ['a', 'between a and b', 'c', 'between d and e', 'f'])
    
    '''
    if type(data) is list:        
        data = pd.Series(data)
        
    data = data.dropna()
    if levels is not None:
        pd.set_option('future.no_silent_downcasting', True)
        dataN = data.map(levels).astype('Int8')
    else:
        dataN = pd.to_numeric(data)
    
    dataN = dataN.sort_values().reset_index(drop=True)
    #ataN = list(dataN)
    
    #alternative namings
    if method in ["cdf", "sas5", "hf2", "averaged_inverted_cdf", "r2"]:
        method = "sas5"
    elif method in ["sas4", "minitab", "hf6", "weibull", "maple5", "r6"]:
        method = "sas4"
    elif method in ["excel", "hf7", "pd1", "linear", "gumbel", "maple6", "r7"]:
        method = "excel"
    elif method in ["sas1", "parzen", "hf4", "interpolated_inverted_cdf", "maple3", "r4"]:
        method = "sas1"
    elif method in ["sas2", "hf3", "r3"]:
        method = "sas2"
    elif method in ["sas3", "hf1", "inverted_cdf", "maple1", "r1"]:
        method = "sas3"
    elif method in ["hf3b", "closest_observation"]:
        method = "hf3b"
    elif method in ["hl2", "hazen", "hf5", "maple4"]:
        method = "hl2"
    elif method in ["np", "midpoint", "pd5"]:
        method = "pd5"
    elif method in ["hf8", "median_unbiased", "maple7", "r8"]:
        method = "hf8"
    elif method in ["hf9", "normal_unbiased", "maple8", "r9"]:
        method = "hf9"
    elif method in ["pd2", "lower"]:
        method = "pd2"
    elif method in ["pd3", "higher"]:
        method = "pd3"
    elif method in ["pd4", "nearest"]:
        method = "pd4"
    
    #settings
    settings = [indexMethod, qLfrac, qLint, qHfrac, qHint]
    if method=="sas1":
        settings = ["sas1","linear","int","linear","int"]
    elif method=="sas2":
        settings = ["sas1","bankers","int","bankers" ,"int"]
    elif method=="sas3":
        settings = ["sas1","up","int","up","int"]
    elif method=="sas5":
        settings = ["sas1","up","midpoint","up","midpoint"]
    elif method=="sas4":    
        settings = ["sas4","linear", "int","linear","int"]
    elif method=="ms": 
        settings = ["sas4", "nearest","int", "halfdown","int"]
    elif method=="lohninger":
        settings = ["sas4", "nearest", "int","nearest","int"]
    elif method=="hl2":
        settings = ["hl", "linear", "int","linear","int"]
    elif method=="hl1":
        settings = ["hl", "midpoint","int", "midpoint","int"]
    elif method=="excel":
        settings = ["excel", "linear","int","linear", "int"]
    elif method=="pd2":
        settings = ["excel", "down", "int", "down","int"]
    elif method=="pd3":
        settings = ["excel", "up","int","up","int"]
    elif method=="pd4":
        settings = ["excel", "halfdown",  "int","nearest", "int"]
    elif method=="hf3b":
        settings = ["sas1", "nearest","int","halfdown","int"]
    elif method=="pd5":
        settings = ["excel", "midpoint","int","midpoint","int"]
    elif method=="hf8":
        settings = ["hf8", "linear","int","linear", "int"]
    elif method=="hf9":
        settings = ["hf9", "linear","int","linear", "int"]
    elif method=="maple2":
        settings = ["hl", "down","int","down", "int"]
    
    quantiles = he_quantileIndex(dataN, k, settings[0], settings[1], settings[2], settings[3], settings[4])
    #he_quantileIndex(data, k=4, indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int")
    #find the text representatives
    if levels is not None:
        quantilesText = []
        for i in range(k+1):
            if quantiles[i] == round(quantiles[i]):
                qT = list(levels.keys())[list(levels.values()).index(quantiles[i])]

            else:
                qT = "between " + list(levels.keys())[list(levels.values()).index(math.floor(quantiles[i]))] + " and " + list(levels.keys())[list(levels.values()).index(math.ceil(quantiles[i]))]
            quantilesText.append(qT)
            
        results = quantiles, quantilesText
    else:
        results = quantiles
    
    return results

Functions

def me_quantiles(data, levels=None, k=4, method='own', indexMethod='sas1', qLfrac='linear', qLint='int', qHfrac='linear', qHint='int')

Quantiles

Quantiles split the data into k sections, each containing n/k scores. They can be seen as a generalisation of various 'tiles'. For example 4-quantiles is the same as the quartiles, 5-quantiles the same as quintiles, 100-quantiles the same as percentiles, etc.

Quite a few different methods exist to determine these. See the notes for more information.

This function is shown in this YouTube video and the measure is also described at PeterStatistics.com

Parameters

data : list or pandas series
 
levels : dictionary, optional
coding to use
k : number of quantiles
 
method : string, optional
which method to use to calculate quantiles
indexMethod : {"sas1", "sas4", "excel", "hl", "hf8", "hf9"}, optional
to indicate which type of indexing to use. Default is "sas1"
qLfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional
to indicate what type of rounding to use for quantiles below 50 percent. Default is "linear"
qLint : {"int", "midpoint"}, optional
to indicate the use of the integer or the midpoint method for first quarter. Default is "int"
qHfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional
to indicate what type of rounding to use for quantiles equal or above 50 percent. Default is "linear"
qHint : {"int", "midpoint"}, optional
to indicate the use of the integer or the midpoint method for quantiles equal or above 50 percent. Default is "int"

method can be set to "own" and then provide the next parameters, or any of the methods listed in the notes.

Returns

results : the quantiles, or if levels are used also additionally text versions
 

Notes

To determine the quartiles a specific indexing method can be used. See he_quantileIndexing() for details on the different methods to choose from.

Then based on the indexes either linear interpolation or different rounding methods (bankers, nearest, down, up, half-down) can be used, or the midpoint between the two values. If the index is an integer either the integer or the mid point is used.

See the he_quantilesIndex() for details on this.

Note that the rounding method can even vary per quantile, i.e. the one used for the ones below the median being different than the one those equal or above.

I've come across the following methods:

method indexing q1 integer q1 fractional q3 integer q3 fractional
sas1 sas1 use int linear use int linear
sas2 sas1 use int bankers use int bankers
sas3 sas1 use int up use int up
sas5 sas1 midpoint up midpoint up
hf3b sas1 use int nearest use int halfdown
sas4 sas4 use int linear use int linear
ms sas4 use int nearest use int halfdown
lohninger sas4 use int nearest use int nearest
hl2 hl use int linear use int linear
hl1 hl use int midpoint use int midpoint
excel excel use int linear use int linear
pd2 excel use int down use int down
pd3 excel use int up use int up
pd4 excel use int halfdown use int nearest
pd5 excel use int midpoint use int midpoint
hf8 hf8 use int linear use int linear
hf9 hf9 use int linear use int linear

The following values can be used for the method parameter:

  1. sas1 = parzen = hf4 = interpolated_inverted_cdf = maple3 = r4. (Parzen, 1979, p. 108; SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 363)
  2. sas2 = hf3 = r3. (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
  3. sas3 = hf1 = inverted_cdf = maple1 = r1 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
  4. sas4 = hf6 = minitab = snedecor = weibull = maple5 = r6 (Hyndman & Fan, 1996, p. 363; Weibull, 1939, p. ?; Snedecor, 1940, p. 43; SAS, 1990, p. 626)
  5. sas5 = hf2 = CDF = averaged_inverted_cdf = r2 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
  6. hf3b = closest_observation
  7. ms (Mendenhall & Sincich, 1992, p. 35)
  8. lohninger (Lohninger, n.d.)
  9. hl1 (Hogg & Ledolter, 1992, p. 21)
  10. hl2 = hf5 = Hazen = maple4 = r5 (Hogg & Ledolter, 1992, p. 21; Hazen, 1914, p. ?)
  11. maple2
  12. excel = hf7 = pd1 = linear = gumbel = maple6 = r7 (Hyndman & Fan, 1996, p. 363; Freund & Perles, 1987, p. 201; Gumbel, 1939, p. ?)
  13. pd2 = lower
  14. pd3 = higher
  15. pd4 = nearest
  16. pd5 = midpoint
  17. hf8 = median_unbiased = maple7 = r8 (Hyndman & Fan, 1996, p. 363)
  18. hf9 = normal_unbiased = maple8 = r9 (Hyndman & Fan, 1996, p. 363)

hf is short for Hyndman and Fan who wrote an article showcasing many different methods, hl is short for Hog and Ledolter, ms is short for Mendenhall and Sincich, jf is short for Joarder and Firozzaman. sas refers to the software package SAS, maple to Maple, pd to Python's pandas library, and r to R.

The names linear, lower, higher, nearest and midpoint are all used by pandas quantile function and numpy percentile function. Numpy also uses inverted_cdf, averaged_inverted_cdf, closest_observation, interpolated_inverted_cdf, hazen, weibull, median_unbiased, and normal_unbiased.

Before, After and Alternatives

Before this measure you might want an impression using a frequency table or a visualisation: * tab_frequency for a frequency table * vi_bar_stacked_single for Single Stacked Bar-Chart * vi_bar_dual_axis for Dual-Axis Bar Chart

After this you might want some other descriptive measures: * me_consensus for the Consensus * me_hodges_lehmann_os for the Hodges-Lehmann Estimate (One-Sample) * me_median for the Median * me_quartiles for Quartiles / Hinges * me_quartile_range for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range

or perform a test: * ts_sign_os for One-Sample Sign Test * ts_trinomial_os for One-Sample Trinomial Test * ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)

For more information on the quartile indexing methods and index itself: * he_quantileIndexing * he_quantilesIndex

References

Freund, J. E., & Perles, B. M. (1987). A new look at quartiles of ungrouped data. The American Statistician, 41(3), 200–203. doi:10.1080/00031305.1987.10475479

Galton, F. (1881). Report of the anthropometric committee. Report of the British Association for the Advancement of Science, 51, 225–272.

Gumbel, E. J. (1939). La Probabilité des Hypothèses. Compes Rendus de l’ Académie des Sciences, 209, 645–647.

Hazen, A. (1914). Storage to be provided in impounding municipal water supply. Transactions of the American Society of Civil Engineers, 77(1), 1539–1640. doi:10.1061/taceat.0002563

Hogg, R. V., & Ledolter, J. (1992). Applied statistics for engineers and physical scientists (2nd int.). Macmillan.

Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361–365. doi:10.2307/2684934

Langford, E. (2006). Quartiles in elementary statistics. Journal of Statistics Education, 14(3), 1–17. doi:10.1080/10691898.2006.11910589

Lohninger, H. (n.d.). Quartile. Fundamentals of Statistics. Retrieved April 7, 2023, from http://www.statistics4u.com/fundstat_eng/cc_quartile.html

McAlister, D. (1879). The law of the geometric mean. Proceedings of the Royal Society of London, 29(196–199), 367–376. doi:10.1098/rspl.1879.0061

Mendenhall, W., & Sincich, T. (1992). Statistics for engineering and the sciences (3rd ed.). Dellen Publishing Company.

Parzen, E. (1979). Nonparametric statistical data modeling. Journal of the American Statistical Association, 74(365), 105–121. doi:10.1080/01621459.1979.10481621

SAS. (1990). SAS procedures guide: Version 6 (3rd ed.). SAS Institute.

Siegel, A. F., & Morgan, C. J. (1996). Statistics and data analysis: An introduction (2nd ed.). J. Wiley.

Snedecor, G. W. (1940). Statistical methods applied to experiments in agriculture and biology (3rd ed.). The Iowa State College Press.

Vining, G. G. (1998). Statistical methods for engineers. Duxbury Press.

Weibull, W. (1939). The phenomenon of rupture in solids. Ingeniörs Vetenskaps Akademien, 153, 1–55.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Text Pandas Series

>>> import pandas as pd
>>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = student_df['Teach_Motivate']
>>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
>>> me_quantiles(ex1, levels=order)
(0    1.0
1    1.0
2    2.0
3    3.0
4    5.0
dtype: float64, ['Fully Disagree', 'Fully Disagree', 'Disagree', 'Neither disagree nor agree', 'Fully agree'])

Example 2: Numeric data

>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> me_quantiles(ex2)
0    1.0
1    2.0
2    4.0
3    5.0
4    5.0
dtype: float64

Example 3: Text data

>>> ex3 = ["a", "b", "f", "d", "e", "c"]
>>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
>>> me_quantiles(ex3, levels=order)
(0    1.0
1    1.5
2    3.0
3    4.5
4    6.0
dtype: float64, ['a', 'between a and b', 'c', 'between d and e', 'f'])
Expand source code
def me_quantiles(data, levels=None, k=4, method="own", indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int"):
    '''
    Quantiles
    ---------
    
    Quantiles split the data into k sections, each containing n/k scores. They can be seen as a generalisation of various 'tiles'. For example 4-quantiles is the same as the quartiles, 5-quantiles the same as quintiles, 100-quantiles the same as percentiles, etc.
    
    Quite a few different methods exist to determine these. See the notes for more information.

    This function is shown in this [YouTube video](https://youtu.be/iI07nJ3wlOQ) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/Quantiles.html)
    
    Parameters
    ----------
    data : list or pandas series
    levels : dictionary, optional 
        coding to use
    k : number of quantiles
    method : string, optional 
        which method to use to calculate quantiles
    indexMethod : {"sas1", "sas4", "excel", "hl", "hf8", "hf9"}, optional 
        to indicate which type of indexing to use. Default is "sas1"
    qLfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional 
        to indicate what type of rounding to use for quantiles below 50 percent. Default is "linear"
    qLint : {"int", "midpoint"}, optional 
        to indicate the use of the integer or the midpoint method for first quarter. Default is "int"
    qHfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional 
        to indicate what type of rounding to use for quantiles equal or above 50 percent. Default is "linear"
    qHint : {"int", "midpoint"}, optional  
        to indicate the use of the integer or the midpoint method for quantiles equal or above 50 percent. Default is "int"
    
    method can be set to "own" and then provide the next parameters, or any of the methods listed in the notes.
    
    Returns
    -------
    results : the quantiles, or if levels are used also additionally text versions
    
    Notes
    -----
    To determine the quartiles a specific indexing method can be used. See **he_quantileIndexing()** for details on the different methods to choose from.
    
    Then based on the indexes either linear interpolation or different rounding methods (bankers, nearest, down, up, half-down) can be used, or the midpoint between the two values. If the index is an integer either the integer or the mid point is used. 
    
    See the **he_quantilesIndex()** for details on this.
    
    Note that the rounding method can even vary per quantile, i.e. the one used for the ones below the median being different than the one those equal or above.

    I've come across the following methods:

    |method|indexing|q1 integer|q1 fractional|q3 integer|q3 fractional|
    |------|--------|----------|-------------|----------|-------------|
    |sas1|sas1|use int|linear|use int|linear|
    |sas2|sas1|use int|bankers|use int|bankers|
    |sas3|sas1|use int|up|use int|up|
    |sas5|sas1|midpoint|up|midpoint|up|
    |hf3b|sas1|use int|nearest|use int|halfdown|
    |sas4|sas4|use int|linear|use int|linear|
    |ms|sas4|use int|nearest|use int|halfdown|
    |lohninger|sas4|use int|nearest|use int|nearest|
    |hl2|hl|use int|linear|use int|linear|
    |hl1|hl|use int|midpoint|use int|midpoint|
    |excel|excel|use int|linear|use int|linear|
    |pd2|excel|use int|down|use int|down|
    |pd3|excel|use int|up|use int|up|
    |pd4|excel|use int|halfdown|use int|nearest|
    |pd5|excel|use int|midpoint|use int|midpoint|
    |hf8|hf8|use int|linear|use int|linear|
    |hf9|hf9|use int|linear|use int|linear|

    The following values can be used for the *method* parameter:

    1. sas1 = parzen = hf4 = interpolated_inverted_cdf = maple3 = r4. (Parzen, 1979, p. 108; SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 363)
    1. sas2 = hf3 = r3. (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. sas3 = hf1 = inverted_cdf = maple1 = r1 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. sas4 = hf6 = minitab = snedecor = weibull = maple5 = r6 (Hyndman & Fan, 1996, p. 363; Weibull, 1939, p. ?; Snedecor, 1940, p. 43; SAS, 1990, p. 626)
    1. sas5 = hf2 = CDF = averaged_inverted_cdf = r2 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. hf3b = closest_observation 
    1. ms (Mendenhall & Sincich, 1992, p. 35)
    1. lohninger (Lohninger, n.d.)
    1. hl1 (Hogg & Ledolter, 1992, p. 21)
    1. hl2 = hf5 = Hazen = maple4 = r5 (Hogg & Ledolter, 1992, p. 21; Hazen, 1914, p. ?)
    1. maple2
    1. excel = hf7 = pd1 = linear = gumbel = maple6 = r7 (Hyndman & Fan, 1996, p. 363; Freund & Perles, 1987, p. 201; Gumbel, 1939, p. ?)
    1. pd2 = lower
    1. pd3 = higher
    1. pd4 = nearest
    1. pd5 = midpoint
    1. hf8 = median_unbiased = maple7 = r8 (Hyndman & Fan, 1996, p. 363)
    1. hf9 = normal_unbiased = maple8 = r9 (Hyndman & Fan, 1996, p. 363)

    *hf* is short for Hyndman and Fan who wrote an article showcasing many different methods, *hl* is short for Hog and Ledolter, *ms* is short for Mendenhall and Sincich, *jf* is short for Joarder and Firozzaman. *sas* refers to the software package SAS, *maple* to Maple, *pd* to Python's pandas library, and *r* to R.
    
    The names *linear*, *lower*, *higher*, *nearest* and *midpoint* are all used by pandas quantile function and numpy percentile function. Numpy also uses *inverted_cdf*, *averaged_inverted_cdf*, *closest_observation*, *interpolated_inverted_cdf*, *hazen*, *weibull*, *median_unbiased*, and *normal_unbiased*. 

    Before, After and Alternatives
    ------------------------------
    Before this measure you might want an impression using a frequency table or a visualisation:
    * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
    * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
    * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart

    After this you might want some other descriptive measures:
    * [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus
    * [me_hodges_lehmann_os](../measures/meas_hodges_lehmann_os.html#me_hodges_lehmann_os) for the Hodges-Lehmann Estimate (One-Sample)
    * [me_median](../measures/meas_median.html#me_median) for the Median
    * [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges
    * [me_quartile_range](../measures/meas_quartile_range.html#me_quartile_range) for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
    
    or perform a test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)

    For more information on the quartile indexing methods and index itself:
    * [he_quantileIndexing](../helper/help_quantileIndexing.html#he_quartileIndexing)
    * [he_quantilesIndex](../helper/help_quantileIndex.html#he_quartilesIndex)
    
    References
    ----------
    Freund, J. E., & Perles, B. M. (1987). A new look at quartiles of ungrouped data. *The American Statistician, 41*(3), 200–203. doi:10.1080/00031305.1987.10475479

    Galton, F. (1881). Report of the anthropometric committee. *Report of the British Association for the Advancement of Science, 51*, 225–272.

    Gumbel, E. J. (1939). La Probabilité des Hypothèses. *Compes Rendus de l’ Académie des Sciences, 209*, 645–647.

    Hazen, A. (1914). Storage to be provided in impounding municipal water supply. *Transactions of the American Society of Civil Engineers, 77*(1), 1539–1640. doi:10.1061/taceat.0002563

    Hogg, R. V., & Ledolter, J. (1992). *Applied statistics for engineers and physical scientists* (2nd int.). Macmillan.

    Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. *The American Statistician, 50*(4), 361–365. doi:10.2307/2684934

    Langford, E. (2006). Quartiles in elementary statistics. *Journal of Statistics Education, 14*(3), 1–17. doi:10.1080/10691898.2006.11910589

    Lohninger, H. (n.d.). Quartile. Fundamentals of Statistics. Retrieved April 7, 2023, from http://www.statistics4u.com/fundstat_eng/cc_quartile.html

    McAlister, D. (1879). The law of the geometric mean. *Proceedings of the Royal Society of London, 29*(196–199), 367–376. doi:10.1098/rspl.1879.0061

    Mendenhall, W., & Sincich, T. (1992). *Statistics for engineering and the sciences* (3rd ed.). Dellen Publishing Company.

    Parzen, E. (1979). Nonparametric statistical data modeling. *Journal of the American Statistical Association, 74*(365), 105–121. doi:10.1080/01621459.1979.10481621

    SAS. (1990). SAS procedures guide: Version 6 (3rd ed.). SAS Institute.

    Siegel, A. F., & Morgan, C. J. (1996). *Statistics and data analysis: An introduction* (2nd ed.). J. Wiley.

    Snedecor, G. W. (1940). *Statistical methods applied to experiments in agriculture and biology* (3rd ed.). The Iowa State College Press.

    Vining, G. G. (1998). *Statistical methods for engineers*. Duxbury Press.

    Weibull, W. (1939).* The phenomenon of rupture in solids*. Ingeniörs Vetenskaps Akademien, 153, 1–55.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076


    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = student_df['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> me_quantiles(ex1, levels=order)
    (0    1.0
    1    1.0
    2    2.0
    3    3.0
    4    5.0
    dtype: float64, ['Fully Disagree', 'Fully Disagree', 'Disagree', 'Neither disagree nor agree', 'Fully agree'])
    
    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> me_quantiles(ex2)
    0    1.0
    1    2.0
    2    4.0
    3    5.0
    4    5.0
    dtype: float64
    
    Example 3: Text data
    >>> ex3 = ["a", "b", "f", "d", "e", "c"]
    >>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
    >>> me_quantiles(ex3, levels=order)
    (0    1.0
    1    1.5
    2    3.0
    3    4.5
    4    6.0
    dtype: float64, ['a', 'between a and b', 'c', 'between d and e', 'f'])
    
    '''
    if type(data) is list:        
        data = pd.Series(data)
        
    data = data.dropna()
    if levels is not None:
        pd.set_option('future.no_silent_downcasting', True)
        dataN = data.map(levels).astype('Int8')
    else:
        dataN = pd.to_numeric(data)
    
    dataN = dataN.sort_values().reset_index(drop=True)
    #ataN = list(dataN)
    
    #alternative namings
    if method in ["cdf", "sas5", "hf2", "averaged_inverted_cdf", "r2"]:
        method = "sas5"
    elif method in ["sas4", "minitab", "hf6", "weibull", "maple5", "r6"]:
        method = "sas4"
    elif method in ["excel", "hf7", "pd1", "linear", "gumbel", "maple6", "r7"]:
        method = "excel"
    elif method in ["sas1", "parzen", "hf4", "interpolated_inverted_cdf", "maple3", "r4"]:
        method = "sas1"
    elif method in ["sas2", "hf3", "r3"]:
        method = "sas2"
    elif method in ["sas3", "hf1", "inverted_cdf", "maple1", "r1"]:
        method = "sas3"
    elif method in ["hf3b", "closest_observation"]:
        method = "hf3b"
    elif method in ["hl2", "hazen", "hf5", "maple4"]:
        method = "hl2"
    elif method in ["np", "midpoint", "pd5"]:
        method = "pd5"
    elif method in ["hf8", "median_unbiased", "maple7", "r8"]:
        method = "hf8"
    elif method in ["hf9", "normal_unbiased", "maple8", "r9"]:
        method = "hf9"
    elif method in ["pd2", "lower"]:
        method = "pd2"
    elif method in ["pd3", "higher"]:
        method = "pd3"
    elif method in ["pd4", "nearest"]:
        method = "pd4"
    
    #settings
    settings = [indexMethod, qLfrac, qLint, qHfrac, qHint]
    if method=="sas1":
        settings = ["sas1","linear","int","linear","int"]
    elif method=="sas2":
        settings = ["sas1","bankers","int","bankers" ,"int"]
    elif method=="sas3":
        settings = ["sas1","up","int","up","int"]
    elif method=="sas5":
        settings = ["sas1","up","midpoint","up","midpoint"]
    elif method=="sas4":    
        settings = ["sas4","linear", "int","linear","int"]
    elif method=="ms": 
        settings = ["sas4", "nearest","int", "halfdown","int"]
    elif method=="lohninger":
        settings = ["sas4", "nearest", "int","nearest","int"]
    elif method=="hl2":
        settings = ["hl", "linear", "int","linear","int"]
    elif method=="hl1":
        settings = ["hl", "midpoint","int", "midpoint","int"]
    elif method=="excel":
        settings = ["excel", "linear","int","linear", "int"]
    elif method=="pd2":
        settings = ["excel", "down", "int", "down","int"]
    elif method=="pd3":
        settings = ["excel", "up","int","up","int"]
    elif method=="pd4":
        settings = ["excel", "halfdown",  "int","nearest", "int"]
    elif method=="hf3b":
        settings = ["sas1", "nearest","int","halfdown","int"]
    elif method=="pd5":
        settings = ["excel", "midpoint","int","midpoint","int"]
    elif method=="hf8":
        settings = ["hf8", "linear","int","linear", "int"]
    elif method=="hf9":
        settings = ["hf9", "linear","int","linear", "int"]
    elif method=="maple2":
        settings = ["hl", "down","int","down", "int"]
    
    quantiles = he_quantileIndex(dataN, k, settings[0], settings[1], settings[2], settings[3], settings[4])
    #he_quantileIndex(data, k=4, indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int")
    #find the text representatives
    if levels is not None:
        quantilesText = []
        for i in range(k+1):
            if quantiles[i] == round(quantiles[i]):
                qT = list(levels.keys())[list(levels.values()).index(quantiles[i])]

            else:
                qT = "between " + list(levels.keys())[list(levels.values()).index(math.floor(quantiles[i]))] + " and " + list(levels.keys())[list(levels.values()).index(math.ceil(quantiles[i]))]
            quantilesText.append(qT)
            
        results = quantiles, quantilesText
    else:
        results = quantiles
    
    return results