Module `stikpetP.measures.meas_quantiles`

Expand source code

import pandas as pd
import math
from ..helper.help_quantileIndex import he_quantileIndex

def me_quantiles(data, levels=None, k=4, method="own", indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int"):
    '''
    Quantiles
    ---------
    
    Quantiles split the data into k sections, each containing n/k scores. They can be seen as a generalisation of various 'tiles'. For example 4-quantiles is the same as the quartiles, 5-quantiles the same as quintiles, 100-quantiles the same as percentiles, etc.
    
    Quite a few different methods exist to determine these. See the notes for more information.

    This function is shown in this [YouTube video](https://youtu.be/iI07nJ3wlOQ) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/Quantiles.html)
    
    Parameters
    ----------
    data : list or pandas series
    levels : dictionary, optional 
        coding to use
    k : number of quantiles
    method : string, optional 
        which method to use to calculate quantiles
    indexMethod : {"sas1", "sas4", "excel", "hl", "hf8", "hf9"}, optional 
        to indicate which type of indexing to use. Default is "sas1"
    qLfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional 
        to indicate what type of rounding to use for quantiles below 50 percent. Default is "linear"
    qLint : {"int", "midpoint"}, optional 
        to indicate the use of the integer or the midpoint method for first quarter. Default is "int"
    qHfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional 
        to indicate what type of rounding to use for quantiles equal or above 50 percent. Default is "linear"
    qHint : {"int", "midpoint"}, optional  
        to indicate the use of the integer or the midpoint method for quantiles equal or above 50 percent. Default is "int"
    
    method can be set to "own" and then provide the next parameters, or any of the methods listed in the notes.
    
    Returns
    -------
    results : the quantiles, or if levels are used also additionally text versions
    
    Notes
    -----
    To determine the quartiles a specific indexing method can be used. See **he_quantileIndexing()** for details on the different methods to choose from.
    
    Then based on the indexes either linear interpolation or different rounding methods (bankers, nearest, down, up, half-down) can be used, or the midpoint between the two values. If the index is an integer either the integer or the mid point is used. 
    
    See the **he_quantilesIndex()** for details on this.
    
    Note that the rounding method can even vary per quantile, i.e. the one used for the ones below the median being different than the one those equal or above.

    I've come across the following methods:

    |method|indexing|q1 integer|q1 fractional|q3 integer|q3 fractional|
    |------|--------|----------|-------------|----------|-------------|
    |sas1|sas1|use int|linear|use int|linear|
    |sas2|sas1|use int|bankers|use int|bankers|
    |sas3|sas1|use int|up|use int|up|
    |sas5|sas1|midpoint|up|midpoint|up|
    |hf3b|sas1|use int|nearest|use int|halfdown|
    |sas4|sas4|use int|linear|use int|linear|
    |ms|sas4|use int|nearest|use int|halfdown|
    |lohninger|sas4|use int|nearest|use int|nearest|
    |hl2|hl|use int|linear|use int|linear|
    |hl1|hl|use int|midpoint|use int|midpoint|
    |excel|excel|use int|linear|use int|linear|
    |pd2|excel|use int|down|use int|down|
    |pd3|excel|use int|up|use int|up|
    |pd4|excel|use int|halfdown|use int|nearest|
    |pd5|excel|use int|midpoint|use int|midpoint|
    |hf8|hf8|use int|linear|use int|linear|
    |hf9|hf9|use int|linear|use int|linear|

    The following values can be used for the *method* parameter:

    1. sas1 = parzen = hf4 = interpolated_inverted_cdf = maple3 = r4. (Parzen, 1979, p. 108; SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 363)
    1. sas2 = hf3 = r3. (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. sas3 = hf1 = inverted_cdf = maple1 = r1 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. sas4 = hf6 = minitab = snedecor = weibull = maple5 = r6 (Hyndman & Fan, 1996, p. 363; Weibull, 1939, p. ?; Snedecor, 1940, p. 43; SAS, 1990, p. 626)
    1. sas5 = hf2 = CDF = averaged_inverted_cdf = r2 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. hf3b = closest_observation 
    1. ms (Mendenhall & Sincich, 1992, p. 35)
    1. lohninger (Lohninger, n.d.)
    1. hl1 (Hogg & Ledolter, 1992, p. 21)
    1. hl2 = hf5 = Hazen = maple4 = r5 (Hogg & Ledolter, 1992, p. 21; Hazen, 1914, p. ?)
    1. maple2
    1. excel = hf7 = pd1 = linear = gumbel = maple6 = r7 (Hyndman & Fan, 1996, p. 363; Freund & Perles, 1987, p. 201; Gumbel, 1939, p. ?)
    1. pd2 = lower
    1. pd3 = higher
    1. pd4 = nearest
    1. pd5 = midpoint
    1. hf8 = median_unbiased = maple7 = r8 (Hyndman & Fan, 1996, p. 363)
    1. hf9 = normal_unbiased = maple8 = r9 (Hyndman & Fan, 1996, p. 363)

    *hf* is short for Hyndman and Fan who wrote an article showcasing many different methods, *hl* is short for Hog and Ledolter, *ms* is short for Mendenhall and Sincich, *jf* is short for Joarder and Firozzaman. *sas* refers to the software package SAS, *maple* to Maple, *pd* to Python's pandas library, and *r* to R.
    
    The names *linear*, *lower*, *higher*, *nearest* and *midpoint* are all used by pandas quantile function and numpy percentile function. Numpy also uses *inverted_cdf*, *averaged_inverted_cdf*, *closest_observation*, *interpolated_inverted_cdf*, *hazen*, *weibull*, *median_unbiased*, and *normal_unbiased*. 

    Before, After and Alternatives
    ------------------------------
    Before this measure you might want an impression using a frequency table or a visualisation:
    * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
    * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
    * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart

    After this you might want some other descriptive measures:
    * [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus
    * [me_hodges_lehmann_os](../measures/meas_hodges_lehmann_os.html#me_hodges_lehmann_os) for the Hodges-Lehmann Estimate (One-Sample)
    * [me_median](../measures/meas_median.html#me_median) for the Median
    * [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges
    * [me_quartile_range](../measures/meas_quartile_range.html#me_quartile_range) for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
    
    or perform a test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)

    For more information on the quartile indexing methods and index itself:
    * [he_quantileIndexing](../helper/help_quantileIndexing.html#he_quartileIndexing)
    * [he_quantilesIndex](../helper/help_quantileIndex.html#he_quartilesIndex)
    
    References
    ----------
    Freund, J. E., & Perles, B. M. (1987). A new look at quartiles of ungrouped data. *The American Statistician, 41*(3), 200–203. doi:10.1080/00031305.1987.10475479

    Galton, F. (1881). Report of the anthropometric committee. *Report of the British Association for the Advancement of Science, 51*, 225–272.

    Gumbel, E. J. (1939). La Probabilité des Hypothèses. *Compes Rendus de l’ Académie des Sciences, 209*, 645–647.

    Hazen, A. (1914). Storage to be provided in impounding municipal water supply. *Transactions of the American Society of Civil Engineers, 77*(1), 1539–1640. doi:10.1061/taceat.0002563

    Hogg, R. V., & Ledolter, J. (1992). *Applied statistics for engineers and physical scientists* (2nd int.). Macmillan.

    Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. *The American Statistician, 50*(4), 361–365. doi:10.2307/2684934

    Langford, E. (2006). Quartiles in elementary statistics. *Journal of Statistics Education, 14*(3), 1–17. doi:10.1080/10691898.2006.11910589

    Lohninger, H. (n.d.). Quartile. Fundamentals of Statistics. Retrieved April 7, 2023, from http://www.statistics4u.com/fundstat_eng/cc_quartile.html

    McAlister, D. (1879). The law of the geometric mean. *Proceedings of the Royal Society of London, 29*(196–199), 367–376. doi:10.1098/rspl.1879.0061

    Mendenhall, W., & Sincich, T. (1992). *Statistics for engineering and the sciences* (3rd ed.). Dellen Publishing Company.

    Parzen, E. (1979). Nonparametric statistical data modeling. *Journal of the American Statistical Association, 74*(365), 105–121. doi:10.1080/01621459.1979.10481621

    SAS. (1990). SAS procedures guide: Version 6 (3rd ed.). SAS Institute.

    Siegel, A. F., & Morgan, C. J. (1996). *Statistics and data analysis: An introduction* (2nd ed.). J. Wiley.

    Snedecor, G. W. (1940). *Statistical methods applied to experiments in agriculture and biology* (3rd ed.). The Iowa State College Press.

    Vining, G. G. (1998). *Statistical methods for engineers*. Duxbury Press.

    Weibull, W. (1939).* The phenomenon of rupture in solids*. Ingeniörs Vetenskaps Akademien, 153, 1–55.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076


    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = student_df['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> me_quantiles(ex1, levels=order)
    (0    1.0
    1    1.0
    2    2.0
    3    3.0
    4    5.0
    dtype: float64, ['Fully Disagree', 'Fully Disagree', 'Disagree', 'Neither disagree nor agree', 'Fully agree'])
    
    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> me_quantiles(ex2)
    0    1.0
    1    2.0
    2    4.0
    3    5.0
    4    5.0
    dtype: float64
    
    Example 3: Text data
    >>> ex3 = ["a", "b", "f", "d", "e", "c"]
    >>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
    >>> me_quantiles(ex3, levels=order)
    (0    1.0
    1    1.5
    2    3.0
    3    4.5
    4    6.0
    dtype: float64, ['a', 'between a and b', 'c', 'between d and e', 'f'])
    
    '''
    if type(data) is list:        
        data = pd.Series(data)
        
    data = data.dropna()
    if levels is not None:
        pd.set_option('future.no_silent_downcasting', True)
        dataN = data.map(levels).astype('Int8')
    else:
        dataN = pd.to_numeric(data)
    
    dataN = dataN.sort_values().reset_index(drop=True)
    #ataN = list(dataN)
    
    #alternative namings
    if method in ["cdf", "sas5", "hf2", "averaged_inverted_cdf", "r2"]:
        method = "sas5"
    elif method in ["sas4", "minitab", "hf6", "weibull", "maple5", "r6"]:
        method = "sas4"
    elif method in ["excel", "hf7", "pd1", "linear", "gumbel", "maple6", "r7"]:
        method = "excel"
    elif method in ["sas1", "parzen", "hf4", "interpolated_inverted_cdf", "maple3", "r4"]:
        method = "sas1"
    elif method in ["sas2", "hf3", "r3"]:
        method = "sas2"
    elif method in ["sas3", "hf1", "inverted_cdf", "maple1", "r1"]:
        method = "sas3"
    elif method in ["hf3b", "closest_observation"]:
        method = "hf3b"
    elif method in ["hl2", "hazen", "hf5", "maple4"]:
        method = "hl2"
    elif method in ["np", "midpoint", "pd5"]:
        method = "pd5"
    elif method in ["hf8", "median_unbiased", "maple7", "r8"]:
        method = "hf8"
    elif method in ["hf9", "normal_unbiased", "maple8", "r9"]:
        method = "hf9"
    elif method in ["pd2", "lower"]:
        method = "pd2"
    elif method in ["pd3", "higher"]:
        method = "pd3"
    elif method in ["pd4", "nearest"]:
        method = "pd4"
    
    #settings
    settings = [indexMethod, qLfrac, qLint, qHfrac, qHint]
    if method=="sas1":
        settings = ["sas1","linear","int","linear","int"]
    elif method=="sas2":
        settings = ["sas1","bankers","int","bankers" ,"int"]
    elif method=="sas3":
        settings = ["sas1","up","int","up","int"]
    elif method=="sas5":
        settings = ["sas1","up","midpoint","up","midpoint"]
    elif method=="sas4":    
        settings = ["sas4","linear", "int","linear","int"]
    elif method=="ms": 
        settings = ["sas4", "nearest","int", "halfdown","int"]
    elif method=="lohninger":
        settings = ["sas4", "nearest", "int","nearest","int"]
    elif method=="hl2":
        settings = ["hl", "linear", "int","linear","int"]
    elif method=="hl1":
        settings = ["hl", "midpoint","int", "midpoint","int"]
    elif method=="excel":
        settings = ["excel", "linear","int","linear", "int"]
    elif method=="pd2":
        settings = ["excel", "down", "int", "down","int"]
    elif method=="pd3":
        settings = ["excel", "up","int","up","int"]
    elif method=="pd4":
        settings = ["excel", "halfdown",  "int","nearest", "int"]
    elif method=="hf3b":
        settings = ["sas1", "nearest","int","halfdown","int"]
    elif method=="pd5":
        settings = ["excel", "midpoint","int","midpoint","int"]
    elif method=="hf8":
        settings = ["hf8", "linear","int","linear", "int"]
    elif method=="hf9":
        settings = ["hf9", "linear","int","linear", "int"]
    elif method=="maple2":
        settings = ["hl", "down","int","down", "int"]
    
    quantiles = he_quantileIndex(dataN, k, settings[0], settings[1], settings[2], settings[3], settings[4])
    #he_quantileIndex(data, k=4, indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int")
    #find the text representatives
    if levels is not None:
        quantilesText = []
        for i in range(k+1):
            if quantiles[i] == round(quantiles[i]):
                qT = list(levels.keys())[list(levels.values()).index(quantiles[i])]

            else:
                qT = "between " + list(levels.keys())[list(levels.values()).index(math.floor(quantiles[i]))] + " and " + list(levels.keys())[list(levels.values()).index(math.ceil(quantiles[i]))]
            quantilesText.append(qT)
            
        results = quantiles, quantilesText
    else:
        results = quantiles
    
    return results

Functions

def me_quantiles(data, levels=None, k=4, method='own', indexMethod='sas1', qLfrac='linear', qLint='int', qHfrac='linear', qHint='int')

Quantiles

Quantiles split the data into k sections, each containing n/k scores. They can be seen as a generalisation of various 'tiles'. For example 4-quantiles is the same as the quartiles, 5-quantiles the same as quintiles, 100-quantiles the same as percentiles, etc.

Quite a few different methods exist to determine these. See the notes for more information.

This function is shown in this YouTube video and the measure is also described at PeterStatistics.com

Parameters

data : list or pandas series
levels : dictionary, optional: coding to use
k : number of quantiles
method : string, optional: which method to use to calculate quantiles
indexMethod : {"sas1", "sas4", "excel", "hl", "hf8", "hf9"}, optional: to indicate which type of indexing to use. Default is "sas1"
qLfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional: to indicate what type of rounding to use for quantiles below 50 percent. Default is "linear"
qLint : {"int", "midpoint"}, optional: to indicate the use of the integer or the midpoint method for first quarter. Default is "int"
qHfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional: to indicate what type of rounding to use for quantiles equal or above 50 percent. Default is "linear"
qHint : {"int", "midpoint"}, optional: to indicate the use of the integer or the midpoint method for quantiles equal or above 50 percent. Default is "int"

method can be set to "own" and then provide the next parameters, or any of the methods listed in the notes.

Returns

results : the quantiles, or if levels are used also additionally text versions

Notes

To determine the quartiles a specific indexing method can be used. See he_quantileIndexing() for details on the different methods to choose from.

Then based on the indexes either linear interpolation or different rounding methods (bankers, nearest, down, up, half-down) can be used, or the midpoint between the two values. If the index is an integer either the integer or the mid point is used.

See the he_quantilesIndex() for details on this.

Note that the rounding method can even vary per quantile, i.e. the one used for the ones below the median being different than the one those equal or above.

I've come across the following methods:

method	indexing	q1 integer	q1 fractional	q3 integer	q3 fractional
sas1	sas1	use int	linear	use int	linear
sas2	sas1	use int	bankers	use int	bankers
sas3	sas1	use int	up	use int	up
sas5	sas1	midpoint	up	midpoint	up
hf3b	sas1	use int	nearest	use int	halfdown
sas4	sas4	use int	linear	use int	linear
ms	sas4	use int	nearest	use int	halfdown
lohninger	sas4	use int	nearest	use int	nearest
hl2	hl	use int	linear	use int	linear
hl1	hl	use int	midpoint	use int	midpoint
excel	excel	use int	linear	use int	linear
pd2	excel	use int	down	use int	down
pd3	excel	use int	up	use int	up
pd4	excel	use int	halfdown	use int	nearest
pd5	excel	use int	midpoint	use int	midpoint
hf8	hf8	use int	linear	use int	linear
hf9	hf9	use int	linear	use int	linear

The following values can be used for the method parameter:

sas1 = parzen = hf4 = interpolated_inverted_cdf = maple3 = r4. (Parzen, 1979, p. 108; SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 363)
sas2 = hf3 = r3. (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
sas3 = hf1 = inverted_cdf = maple1 = r1 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
sas4 = hf6 = minitab = snedecor = weibull = maple5 = r6 (Hyndman & Fan, 1996, p. 363; Weibull, 1939, p. ?; Snedecor, 1940, p. 43; SAS, 1990, p. 626)
sas5 = hf2 = CDF = averaged_inverted_cdf = r2 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
hf3b = closest_observation
ms (Mendenhall & Sincich, 1992, p. 35)
lohninger (Lohninger, n.d.)
hl1 (Hogg & Ledolter, 1992, p. 21)
hl2 = hf5 = Hazen = maple4 = r5 (Hogg & Ledolter, 1992, p. 21; Hazen, 1914, p. ?)
maple2
excel = hf7 = pd1 = linear = gumbel = maple6 = r7 (Hyndman & Fan, 1996, p. 363; Freund & Perles, 1987, p. 201; Gumbel, 1939, p. ?)
pd2 = lower
pd3 = higher
pd4 = nearest
pd5 = midpoint
hf8 = median_unbiased = maple7 = r8 (Hyndman & Fan, 1996, p. 363)
hf9 = normal_unbiased = maple8 = r9 (Hyndman & Fan, 1996, p. 363)

hf is short for Hyndman and Fan who wrote an article showcasing many different methods, hl is short for Hog and Ledolter, ms is short for Mendenhall and Sincich, jf is short for Joarder and Firozzaman. sas refers to the software package SAS, maple to Maple, pd to Python's pandas library, and r to R.

The names linear, lower, higher, nearest and midpoint are all used by pandas quantile function and numpy percentile function. Numpy also uses inverted_cdf, averaged_inverted_cdf, closest_observation, interpolated_inverted_cdf, hazen, weibull, median_unbiased, and normal_unbiased.

Before, After and Alternatives

Before this measure you might want an impression using a frequency table or a visualisation: * tab_frequency for a frequency table * vi_bar_stacked_single for Single Stacked Bar-Chart * vi_bar_dual_axis for Dual-Axis Bar Chart

After this you might want some other descriptive measures: * me_consensus for the Consensus * me_hodges_lehmann_os for the Hodges-Lehmann Estimate (One-Sample) * me_median for the Median * me_quartiles for Quartiles / Hinges * me_quartile_range for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range

or perform a test: * ts_sign_os for One-Sample Sign Test * ts_trinomial_os for One-Sample Trinomial Test * ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)

For more information on the quartile indexing methods and index itself: * he_quantileIndexing * he_quantilesIndex

References

Freund, J. E., & Perles, B. M. (1987). A new look at quartiles of ungrouped data. The American Statistician, 41(3), 200–203. doi:10.1080/00031305.1987.10475479

Galton, F. (1881). Report of the anthropometric committee. Report of the British Association for the Advancement of Science, 51, 225–272.

Gumbel, E. J. (1939). La Probabilité des Hypothèses. Compes Rendus de l’ Académie des Sciences, 209, 645–647.

Hazen, A. (1914). Storage to be provided in impounding municipal water supply. Transactions of the American Society of Civil Engineers, 77(1), 1539–1640. doi:10.1061/taceat.0002563

Hogg, R. V., & Ledolter, J. (1992). Applied statistics for engineers and physical scientists (2nd int.). Macmillan.

Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361–365. doi:10.2307/2684934

Langford, E. (2006). Quartiles in elementary statistics. Journal of Statistics Education, 14(3), 1–17. doi:10.1080/10691898.2006.11910589

Lohninger, H. (n.d.). Quartile. Fundamentals of Statistics. Retrieved April 7, 2023, from http://www.statistics4u.com/fundstat_eng/cc_quartile.html

McAlister, D. (1879). The law of the geometric mean. Proceedings of the Royal Society of London, 29(196–199), 367–376. doi:10.1098/rspl.1879.0061

Mendenhall, W., & Sincich, T. (1992). Statistics for engineering and the sciences (3rd ed.). Dellen Publishing Company.

Parzen, E. (1979). Nonparametric statistical data modeling. Journal of the American Statistical Association, 74(365), 105–121. doi:10.1080/01621459.1979.10481621

SAS. (1990). SAS procedures guide: Version 6 (3rd ed.). SAS Institute.

Siegel, A. F., & Morgan, C. J. (1996). Statistics and data analysis: An introduction (2nd ed.). J. Wiley.

Snedecor, G. W. (1940). Statistical methods applied to experiments in agriculture and biology (3rd ed.). The Iowa State College Press.

Vining, G. G. (1998). Statistical methods for engineers. Duxbury Press.

Weibull, W. (1939). The phenomenon of rupture in solids. Ingeniörs Vetenskaps Akademien, 153, 1–55.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Text Pandas Series

>>> import pandas as pd
>>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = student_df['Teach_Motivate']
>>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
>>> me_quantiles(ex1, levels=order)
(0    1.0
1    1.0
2    2.0
3    3.0
4    5.0
dtype: float64, ['Fully Disagree', 'Fully Disagree', 'Disagree', 'Neither disagree nor agree', 'Fully agree'])

Example 2: Numeric data

>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> me_quantiles(ex2)
0    1.0
1    2.0
2    4.0
3    5.0
4    5.0
dtype: float64

Example 3: Text data

>>> ex3 = ["a", "b", "f", "d", "e", "c"]
>>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
>>> me_quantiles(ex3, levels=order)
(0    1.0
1    1.5
2    3.0
3    4.5
4    6.0
dtype: float64, ['a', 'between a and b', 'c', 'between d and e', 'f'])

Expand source code

def me_quantiles(data, levels=None, k=4, method="own", indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int"):
    '''
    Quantiles
    ---------
    
    Quantiles split the data into k sections, each containing n/k scores. They can be seen as a generalisation of various 'tiles'. For example 4-quantiles is the same as the quartiles, 5-quantiles the same as quintiles, 100-quantiles the same as percentiles, etc.
    
    Quite a few different methods exist to determine these. See the notes for more information.

    This function is shown in this [YouTube video](https://youtu.be/iI07nJ3wlOQ) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/Quantiles.html)
    
    Parameters
    ----------
    data : list or pandas series
    levels : dictionary, optional 
        coding to use
    k : number of quantiles
    method : string, optional 
        which method to use to calculate quantiles
    indexMethod : {"sas1", "sas4", "excel", "hl", "hf8", "hf9"}, optional 
        to indicate which type of indexing to use. Default is "sas1"
    qLfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional 
        to indicate what type of rounding to use for quantiles below 50 percent. Default is "linear"
    qLint : {"int", "midpoint"}, optional 
        to indicate the use of the integer or the midpoint method for first quarter. Default is "int"
    qHfrac : {"linear", "down", "up", "bankers", "nearest", "halfdown", "midpoint"}, optional 
        to indicate what type of rounding to use for quantiles equal or above 50 percent. Default is "linear"
    qHint : {"int", "midpoint"}, optional  
        to indicate the use of the integer or the midpoint method for quantiles equal or above 50 percent. Default is "int"
    
    method can be set to "own" and then provide the next parameters, or any of the methods listed in the notes.
    
    Returns
    -------
    results : the quantiles, or if levels are used also additionally text versions
    
    Notes
    -----
    To determine the quartiles a specific indexing method can be used. See **he_quantileIndexing()** for details on the different methods to choose from.
    
    Then based on the indexes either linear interpolation or different rounding methods (bankers, nearest, down, up, half-down) can be used, or the midpoint between the two values. If the index is an integer either the integer or the mid point is used. 
    
    See the **he_quantilesIndex()** for details on this.
    
    Note that the rounding method can even vary per quantile, i.e. the one used for the ones below the median being different than the one those equal or above.

    I've come across the following methods:

    |method|indexing|q1 integer|q1 fractional|q3 integer|q3 fractional|
    |------|--------|----------|-------------|----------|-------------|
    |sas1|sas1|use int|linear|use int|linear|
    |sas2|sas1|use int|bankers|use int|bankers|
    |sas3|sas1|use int|up|use int|up|
    |sas5|sas1|midpoint|up|midpoint|up|
    |hf3b|sas1|use int|nearest|use int|halfdown|
    |sas4|sas4|use int|linear|use int|linear|
    |ms|sas4|use int|nearest|use int|halfdown|
    |lohninger|sas4|use int|nearest|use int|nearest|
    |hl2|hl|use int|linear|use int|linear|
    |hl1|hl|use int|midpoint|use int|midpoint|
    |excel|excel|use int|linear|use int|linear|
    |pd2|excel|use int|down|use int|down|
    |pd3|excel|use int|up|use int|up|
    |pd4|excel|use int|halfdown|use int|nearest|
    |pd5|excel|use int|midpoint|use int|midpoint|
    |hf8|hf8|use int|linear|use int|linear|
    |hf9|hf9|use int|linear|use int|linear|

    The following values can be used for the *method* parameter:

    1. sas1 = parzen = hf4 = interpolated_inverted_cdf = maple3 = r4. (Parzen, 1979, p. 108; SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 363)
    1. sas2 = hf3 = r3. (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. sas3 = hf1 = inverted_cdf = maple1 = r1 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. sas4 = hf6 = minitab = snedecor = weibull = maple5 = r6 (Hyndman & Fan, 1996, p. 363; Weibull, 1939, p. ?; Snedecor, 1940, p. 43; SAS, 1990, p. 626)
    1. sas5 = hf2 = CDF = averaged_inverted_cdf = r2 (SAS, 1990, p. 626; Hyndman & Fan, 1996, p. 362)
    1. hf3b = closest_observation 
    1. ms (Mendenhall & Sincich, 1992, p. 35)
    1. lohninger (Lohninger, n.d.)
    1. hl1 (Hogg & Ledolter, 1992, p. 21)
    1. hl2 = hf5 = Hazen = maple4 = r5 (Hogg & Ledolter, 1992, p. 21; Hazen, 1914, p. ?)
    1. maple2
    1. excel = hf7 = pd1 = linear = gumbel = maple6 = r7 (Hyndman & Fan, 1996, p. 363; Freund & Perles, 1987, p. 201; Gumbel, 1939, p. ?)
    1. pd2 = lower
    1. pd3 = higher
    1. pd4 = nearest
    1. pd5 = midpoint
    1. hf8 = median_unbiased = maple7 = r8 (Hyndman & Fan, 1996, p. 363)
    1. hf9 = normal_unbiased = maple8 = r9 (Hyndman & Fan, 1996, p. 363)

    *hf* is short for Hyndman and Fan who wrote an article showcasing many different methods, *hl* is short for Hog and Ledolter, *ms* is short for Mendenhall and Sincich, *jf* is short for Joarder and Firozzaman. *sas* refers to the software package SAS, *maple* to Maple, *pd* to Python's pandas library, and *r* to R.
    
    The names *linear*, *lower*, *higher*, *nearest* and *midpoint* are all used by pandas quantile function and numpy percentile function. Numpy also uses *inverted_cdf*, *averaged_inverted_cdf*, *closest_observation*, *interpolated_inverted_cdf*, *hazen*, *weibull*, *median_unbiased*, and *normal_unbiased*. 

    Before, After and Alternatives
    ------------------------------
    Before this measure you might want an impression using a frequency table or a visualisation:
    * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
    * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
    * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart

    After this you might want some other descriptive measures:
    * [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus
    * [me_hodges_lehmann_os](../measures/meas_hodges_lehmann_os.html#me_hodges_lehmann_os) for the Hodges-Lehmann Estimate (One-Sample)
    * [me_median](../measures/meas_median.html#me_median) for the Median
    * [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges
    * [me_quartile_range](../measures/meas_quartile_range.html#me_quartile_range) for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
    
    or perform a test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)

    For more information on the quartile indexing methods and index itself:
    * [he_quantileIndexing](../helper/help_quantileIndexing.html#he_quartileIndexing)
    * [he_quantilesIndex](../helper/help_quantileIndex.html#he_quartilesIndex)
    
    References
    ----------
    Freund, J. E., & Perles, B. M. (1987). A new look at quartiles of ungrouped data. *The American Statistician, 41*(3), 200–203. doi:10.1080/00031305.1987.10475479

    Galton, F. (1881). Report of the anthropometric committee. *Report of the British Association for the Advancement of Science, 51*, 225–272.

    Gumbel, E. J. (1939). La Probabilité des Hypothèses. *Compes Rendus de l’ Académie des Sciences, 209*, 645–647.

    Hazen, A. (1914). Storage to be provided in impounding municipal water supply. *Transactions of the American Society of Civil Engineers, 77*(1), 1539–1640. doi:10.1061/taceat.0002563

    Hogg, R. V., & Ledolter, J. (1992). *Applied statistics for engineers and physical scientists* (2nd int.). Macmillan.

    Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. *The American Statistician, 50*(4), 361–365. doi:10.2307/2684934

    Langford, E. (2006). Quartiles in elementary statistics. *Journal of Statistics Education, 14*(3), 1–17. doi:10.1080/10691898.2006.11910589

    Lohninger, H. (n.d.). Quartile. Fundamentals of Statistics. Retrieved April 7, 2023, from http://www.statistics4u.com/fundstat_eng/cc_quartile.html

    McAlister, D. (1879). The law of the geometric mean. *Proceedings of the Royal Society of London, 29*(196–199), 367–376. doi:10.1098/rspl.1879.0061

    Mendenhall, W., & Sincich, T. (1992). *Statistics for engineering and the sciences* (3rd ed.). Dellen Publishing Company.

    Parzen, E. (1979). Nonparametric statistical data modeling. *Journal of the American Statistical Association, 74*(365), 105–121. doi:10.1080/01621459.1979.10481621

    SAS. (1990). SAS procedures guide: Version 6 (3rd ed.). SAS Institute.

    Siegel, A. F., & Morgan, C. J. (1996). *Statistics and data analysis: An introduction* (2nd ed.). J. Wiley.

    Snedecor, G. W. (1940). *Statistical methods applied to experiments in agriculture and biology* (3rd ed.). The Iowa State College Press.

    Vining, G. G. (1998). *Statistical methods for engineers*. Duxbury Press.

    Weibull, W. (1939).* The phenomenon of rupture in solids*. Ingeniörs Vetenskaps Akademien, 153, 1–55.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076


    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = student_df['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> me_quantiles(ex1, levels=order)
    (0    1.0
    1    1.0
    2    2.0
    3    3.0
    4    5.0
    dtype: float64, ['Fully Disagree', 'Fully Disagree', 'Disagree', 'Neither disagree nor agree', 'Fully agree'])
    
    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> me_quantiles(ex2)
    0    1.0
    1    2.0
    2    4.0
    3    5.0
    4    5.0
    dtype: float64
    
    Example 3: Text data
    >>> ex3 = ["a", "b", "f", "d", "e", "c"]
    >>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
    >>> me_quantiles(ex3, levels=order)
    (0    1.0
    1    1.5
    2    3.0
    3    4.5
    4    6.0
    dtype: float64, ['a', 'between a and b', 'c', 'between d and e', 'f'])
    
    '''
    if type(data) is list:        
        data = pd.Series(data)
        
    data = data.dropna()
    if levels is not None:
        pd.set_option('future.no_silent_downcasting', True)
        dataN = data.map(levels).astype('Int8')
    else:
        dataN = pd.to_numeric(data)
    
    dataN = dataN.sort_values().reset_index(drop=True)
    #ataN = list(dataN)
    
    #alternative namings
    if method in ["cdf", "sas5", "hf2", "averaged_inverted_cdf", "r2"]:
        method = "sas5"
    elif method in ["sas4", "minitab", "hf6", "weibull", "maple5", "r6"]:
        method = "sas4"
    elif method in ["excel", "hf7", "pd1", "linear", "gumbel", "maple6", "r7"]:
        method = "excel"
    elif method in ["sas1", "parzen", "hf4", "interpolated_inverted_cdf", "maple3", "r4"]:
        method = "sas1"
    elif method in ["sas2", "hf3", "r3"]:
        method = "sas2"
    elif method in ["sas3", "hf1", "inverted_cdf", "maple1", "r1"]:
        method = "sas3"
    elif method in ["hf3b", "closest_observation"]:
        method = "hf3b"
    elif method in ["hl2", "hazen", "hf5", "maple4"]:
        method = "hl2"
    elif method in ["np", "midpoint", "pd5"]:
        method = "pd5"
    elif method in ["hf8", "median_unbiased", "maple7", "r8"]:
        method = "hf8"
    elif method in ["hf9", "normal_unbiased", "maple8", "r9"]:
        method = "hf9"
    elif method in ["pd2", "lower"]:
        method = "pd2"
    elif method in ["pd3", "higher"]:
        method = "pd3"
    elif method in ["pd4", "nearest"]:
        method = "pd4"
    
    #settings
    settings = [indexMethod, qLfrac, qLint, qHfrac, qHint]
    if method=="sas1":
        settings = ["sas1","linear","int","linear","int"]
    elif method=="sas2":
        settings = ["sas1","bankers","int","bankers" ,"int"]
    elif method=="sas3":
        settings = ["sas1","up","int","up","int"]
    elif method=="sas5":
        settings = ["sas1","up","midpoint","up","midpoint"]
    elif method=="sas4":    
        settings = ["sas4","linear", "int","linear","int"]
    elif method=="ms": 
        settings = ["sas4", "nearest","int", "halfdown","int"]
    elif method=="lohninger":
        settings = ["sas4", "nearest", "int","nearest","int"]
    elif method=="hl2":
        settings = ["hl", "linear", "int","linear","int"]
    elif method=="hl1":
        settings = ["hl", "midpoint","int", "midpoint","int"]
    elif method=="excel":
        settings = ["excel", "linear","int","linear", "int"]
    elif method=="pd2":
        settings = ["excel", "down", "int", "down","int"]
    elif method=="pd3":
        settings = ["excel", "up","int","up","int"]
    elif method=="pd4":
        settings = ["excel", "halfdown",  "int","nearest", "int"]
    elif method=="hf3b":
        settings = ["sas1", "nearest","int","halfdown","int"]
    elif method=="pd5":
        settings = ["excel", "midpoint","int","midpoint","int"]
    elif method=="hf8":
        settings = ["hf8", "linear","int","linear", "int"]
    elif method=="hf9":
        settings = ["hf9", "linear","int","linear", "int"]
    elif method=="maple2":
        settings = ["hl", "down","int","down", "int"]
    
    quantiles = he_quantileIndex(dataN, k, settings[0], settings[1], settings[2], settings[3], settings[4])
    #he_quantileIndex(data, k=4, indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int")
    #find the text representatives
    if levels is not None:
        quantilesText = []
        for i in range(k+1):
            if quantiles[i] == round(quantiles[i]):
                qT = list(levels.keys())[list(levels.values()).index(quantiles[i])]

            else:
                qT = "between " + list(levels.keys())[list(levels.values()).index(math.floor(quantiles[i]))] + " and " + list(levels.keys())[list(levels.values()).index(math.ceil(quantiles[i]))]
            quantilesText.append(qT)
            
        results = quantiles, quantilesText
    else:
        results = quantiles
    
    return results