Module `stikpetP.measures.meas_quartile_range`

Expand source code

import pandas as pd
from .meas_quartiles import me_quartiles

def me_quartile_range(data, levels=None, measure="iqr", method="cdf"):
    '''
    Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
    --------------------------------------------------------------------
    
    There are some measures of dispersion that instead of using the full range (i.e. maximum minus minimum), make use of the quartiles. The advantage of this, is that it is less influenced by extreme values.
    
    The Interquartile Range (Galton, 1881, p. 245) is the range how big the difference is between the third and the first quartile. If Tukey's method for the quartiles is used (*method="tukey"*), referred to as hinges, this is then also known as H-spread (Tukey, 1977, p. 44)
    
    Yule (1911, p. 147) used half the inter-quartile range and labelled this Semi-Interquartile Range which he preferred over the term Quartile Deviation.
    
    There is also a measure of central tendency that uses the quartiles, the Mid-Quartile (Parzen, 1980, p. 19), which is the average of the first and second quartile. It is also sometimes referred to as the Mid-Quartile Range (see for example Luo et al. (2018, p. 2), who refer to Triola, but Triola doesn't add the 'range' (Triola, 2010, p. 120))
    
    The function uses the *me_quartiles* function and any of the methods from that function can be used.

    This function is shown in this [YouTube video](https://youtu.be/CZ-Lx6rsXXE) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/QuartileRanges.html)
    
    Parameters
    ----------
    data : list or pandas series 
        the scores as numbers, or if text also provide levels
    levels : dictionary, optional 
        levels in order
    measure : {"iqr", "siqr", "qd", "mqr"}, optional 
        the specific measure to determine. Default is "iqr"
    method : string, optional 
        the method to use to determine the quartiles. See me_quartiles for options
    
    Returns
    -------
    pandas.DataFrame
        A dataframe with the following columns:
    
        * *Q1*, the first (lower) quartile
        * *Q3*, the third (upper/higher) quartile
        * *range*, the measure determined
    
    Notes
    -----
    The formula used for the Interquartile Range is:
    $$IQR = Q_3 - Q_1$$
    
    This can be obtained by setting *range="iqr"*.
    
    The IQR is mentioned in Galton (1881, p. 245) and the H-spread in Tukey (1977, p. 44).
    
    The H-spread can be obtained by setting *range="iqr"* and *method="tukey"*.
    
    The formula used for the Semi-Interquartile Range (Quartile Deviation) is (Yule, 1911, p. 147):
    $$SIQR = \\frac{Q_3 - Q_1}{2}$$
    
    This can be obtained by setting *range="siqr"* or *range="qd"*.
    
    The formula for the mid-quartile range used is:
    $$MQR = \\frac{Q_3 + Q_1}{2}$$
    
    This can be obtained by setting *range="mqr"*.
    This formula can be found in Parzen (1980, p. 19), but there are probably older references.
    
    Before, After and Alternatives
    ------------------------------
    Before this measure you might want an impression using a frequency table or a visualisation:
    * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
    * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
    * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart

    After this you might want some other descriptive measures:
    * [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus
    * [me_hodges_lehmann_os](../measures/meas_hodges_lehmann_os.html#me_hodges_lehmann_os) for the Hodges-Lehmann Estimate (One-Sample)
    * [me_median](../measures/meas_median.html#me_median) for the Median
    * [me_quantiles](../measures/meas_quantiles.html#me_quantiles) for Quantiles
    * [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges
        
    or perform a test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
    
    References 
    ----------
    Galton, F. (1881). Report of the anthropometric committee. *Report of the British Association for the Advancement of Science, 51*, 225–272.
    
    Luo, D., Wan, X., Liu, J., & Tong, T. (2018). Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. *Statistical Methods in Medical Research, 27*(6), 1785–1805. doi:10.1177/0962280216669183
    
    Parzen, E. (1980). *Data modeling using quantile and density-quantile functions*. Institute of Statistics, Texas A&M University.
    
    Triola, M. F. (2010). *Elementary statistics* (11th ed). Addison-Wesley.
    
    Tukey, J. W. (1977). *Exploratory data analysis*. Addison-Wesley Pub. Co.
    
    Yule, G. U. (1911). *An introduction to the theory of statistics*. Charles Griffin.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df2['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> me_quartile_range(ex1, levels=order)
       Q1  Q3  IQR
    0   1   3    2

    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> me_quartile_range(ex2)
       Q1  Q3  IQR
    0   2   5    3
    
    
    '''
    
    qs = me_quartiles(data, levels=levels, method=method)
    q1 = qs.iloc[0,0]
    q3 = qs.iloc[0,1]
    
    if (measure=="iqr"):
        r = q3 - q1
        if (method in ["inclusive", "tukey", "vining", "hinges"]):
            rName = "Hspread"
        else:
            rName = "IQR"
    elif (measure=="siqr" or measure=="qd"):
        r = (q3 - q1)/2
        rName = "SIQR"
        
    elif (measure=="mqr"):
        r = (q3 + q1)/2
        rName = "MQR"
        
    res = pd.DataFrame([[q1, q3, r]], columns=["Q1", "Q3", rName])
        
    return res

Functions

def me_quartile_range(data, levels=None, measure='iqr', method='cdf')

Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range

There are some measures of dispersion that instead of using the full range (i.e. maximum minus minimum), make use of the quartiles. The advantage of this, is that it is less influenced by extreme values.

The Interquartile Range (Galton, 1881, p. 245) is the range how big the difference is between the third and the first quartile. If Tukey's method for the quartiles is used (method="tukey"), referred to as hinges, this is then also known as H-spread (Tukey, 1977, p. 44)

Yule (1911, p. 147) used half the inter-quartile range and labelled this Semi-Interquartile Range which he preferred over the term Quartile Deviation.

There is also a measure of central tendency that uses the quartiles, the Mid-Quartile (Parzen, 1980, p. 19), which is the average of the first and second quartile. It is also sometimes referred to as the Mid-Quartile Range (see for example Luo et al. (2018, p. 2), who refer to Triola, but Triola doesn't add the 'range' (Triola, 2010, p. 120))

The function uses the me_quartiles function and any of the methods from that function can be used.

This function is shown in this YouTube video and the measure is also described at PeterStatistics.com

Parameters

data : list or pandas series: the scores as numbers, or if text also provide levels
levels : dictionary, optional: levels in order
measure : {"iqr", "siqr", "qd", "mqr"}, optional: the specific measure to determine. Default is "iqr"
method : string, optional: the method to use to determine the quartiles. See me_quartiles for options

Returns

pandas.DataFrame

A dataframe with the following columns:

Q1, the first (lower) quartile
Q3, the third (upper/higher) quartile
range, the measure determined

Notes

The formula used for the Interquartile Range is: $IQR = Q_3 - Q_1$

This can be obtained by setting range="iqr".

The IQR is mentioned in Galton (1881, p. 245) and the H-spread in Tukey (1977, p. 44).

The H-spread can be obtained by setting range="iqr" and method="tukey".

The formula used for the Semi-Interquartile Range (Quartile Deviation) is (Yule, 1911, p. 147): $SIQR = \frac{Q_3 - Q_1}{2}$

This can be obtained by setting range="siqr" or range="qd".

The formula for the mid-quartile range used is: $MQR = \frac{Q_3 + Q_1}{2}$

This can be obtained by setting range="mqr". This formula can be found in Parzen (1980, p. 19), but there are probably older references.

Before, After and Alternatives

Before this measure you might want an impression using a frequency table or a visualisation: * tab_frequency for a frequency table * vi_bar_stacked_single for Single Stacked Bar-Chart * vi_bar_dual_axis for Dual-Axis Bar Chart

After this you might want some other descriptive measures: * me_consensus for the Consensus * me_hodges_lehmann_os for the Hodges-Lehmann Estimate (One-Sample) * me_median for the Median * me_quantiles for Quantiles * me_quartiles for Quartiles / Hinges

or perform a test: * ts_sign_os for One-Sample Sign Test * ts_trinomial_os for One-Sample Trinomial Test * ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)

References

Galton, F. (1881). Report of the anthropometric committee. Report of the British Association for the Advancement of Science, 51, 225–272.

Luo, D., Wan, X., Liu, J., & Tong, T. (2018). Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. Statistical Methods in Medical Research, 27(6), 1785–1805. doi:10.1177/0962280216669183

Parzen, E. (1980). Data modeling using quantile and density-quantile functions. Institute of Statistics, Texas A&M University.

Triola, M. F. (2010). Elementary statistics (11th ed). Addison-Wesley.

Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley Pub. Co.

Yule, G. U. (1911). An introduction to the theory of statistics. Charles Griffin.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Text Pandas Series

>>> import pandas as pd
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df2['Teach_Motivate']
>>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
>>> me_quartile_range(ex1, levels=order)
   Q1  Q3  IQR
0   1   3    2

Example 2: Numeric data

>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> me_quartile_range(ex2)
   Q1  Q3  IQR
0   2   5    3

Expand source code

def me_quartile_range(data, levels=None, measure="iqr", method="cdf"):
    '''
    Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
    --------------------------------------------------------------------
    
    There are some measures of dispersion that instead of using the full range (i.e. maximum minus minimum), make use of the quartiles. The advantage of this, is that it is less influenced by extreme values.
    
    The Interquartile Range (Galton, 1881, p. 245) is the range how big the difference is between the third and the first quartile. If Tukey's method for the quartiles is used (*method="tukey"*), referred to as hinges, this is then also known as H-spread (Tukey, 1977, p. 44)
    
    Yule (1911, p. 147) used half the inter-quartile range and labelled this Semi-Interquartile Range which he preferred over the term Quartile Deviation.
    
    There is also a measure of central tendency that uses the quartiles, the Mid-Quartile (Parzen, 1980, p. 19), which is the average of the first and second quartile. It is also sometimes referred to as the Mid-Quartile Range (see for example Luo et al. (2018, p. 2), who refer to Triola, but Triola doesn't add the 'range' (Triola, 2010, p. 120))
    
    The function uses the *me_quartiles* function and any of the methods from that function can be used.

    This function is shown in this [YouTube video](https://youtu.be/CZ-Lx6rsXXE) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/QuartileRanges.html)
    
    Parameters
    ----------
    data : list or pandas series 
        the scores as numbers, or if text also provide levels
    levels : dictionary, optional 
        levels in order
    measure : {"iqr", "siqr", "qd", "mqr"}, optional 
        the specific measure to determine. Default is "iqr"
    method : string, optional 
        the method to use to determine the quartiles. See me_quartiles for options
    
    Returns
    -------
    pandas.DataFrame
        A dataframe with the following columns:
    
        * *Q1*, the first (lower) quartile
        * *Q3*, the third (upper/higher) quartile
        * *range*, the measure determined
    
    Notes
    -----
    The formula used for the Interquartile Range is:
    $$IQR = Q_3 - Q_1$$
    
    This can be obtained by setting *range="iqr"*.
    
    The IQR is mentioned in Galton (1881, p. 245) and the H-spread in Tukey (1977, p. 44).
    
    The H-spread can be obtained by setting *range="iqr"* and *method="tukey"*.
    
    The formula used for the Semi-Interquartile Range (Quartile Deviation) is (Yule, 1911, p. 147):
    $$SIQR = \\frac{Q_3 - Q_1}{2}$$
    
    This can be obtained by setting *range="siqr"* or *range="qd"*.
    
    The formula for the mid-quartile range used is:
    $$MQR = \\frac{Q_3 + Q_1}{2}$$
    
    This can be obtained by setting *range="mqr"*.
    This formula can be found in Parzen (1980, p. 19), but there are probably older references.
    
    Before, After and Alternatives
    ------------------------------
    Before this measure you might want an impression using a frequency table or a visualisation:
    * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
    * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
    * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart

    After this you might want some other descriptive measures:
    * [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus
    * [me_hodges_lehmann_os](../measures/meas_hodges_lehmann_os.html#me_hodges_lehmann_os) for the Hodges-Lehmann Estimate (One-Sample)
    * [me_median](../measures/meas_median.html#me_median) for the Median
    * [me_quantiles](../measures/meas_quantiles.html#me_quantiles) for Quantiles
    * [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges
        
    or perform a test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
    
    References 
    ----------
    Galton, F. (1881). Report of the anthropometric committee. *Report of the British Association for the Advancement of Science, 51*, 225–272.
    
    Luo, D., Wan, X., Liu, J., & Tong, T. (2018). Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. *Statistical Methods in Medical Research, 27*(6), 1785–1805. doi:10.1177/0962280216669183
    
    Parzen, E. (1980). *Data modeling using quantile and density-quantile functions*. Institute of Statistics, Texas A&M University.
    
    Triola, M. F. (2010). *Elementary statistics* (11th ed). Addison-Wesley.
    
    Tukey, J. W. (1977). *Exploratory data analysis*. Addison-Wesley Pub. Co.
    
    Yule, G. U. (1911). *An introduction to the theory of statistics*. Charles Griffin.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df2['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> me_quartile_range(ex1, levels=order)
       Q1  Q3  IQR
    0   1   3    2

    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> me_quartile_range(ex2)
       Q1  Q3  IQR
    0   2   5    3
    
    
    '''
    
    qs = me_quartiles(data, levels=levels, method=method)
    q1 = qs.iloc[0,0]
    q3 = qs.iloc[0,1]
    
    if (measure=="iqr"):
        r = q3 - q1
        if (method in ["inclusive", "tukey", "vining", "hinges"]):
            rName = "Hspread"
        else:
            rName = "IQR"
    elif (measure=="siqr" or measure=="qd"):
        r = (q3 - q1)/2
        rName = "SIQR"
        
    elif (measure=="mqr"):
        r = (q3 + q1)/2
        rName = "MQR"
        
    res = pd.DataFrame([[q1, q3, r]], columns=["Q1", "Q3", rName])
        
    return res