Module stikpetP.helper.help_quantileIndexing
Expand source code
import pandas as pd
import math
def he_quantileIndexing(data, k=4, method="sas1"):
'''
Quantile Indexing
Helper function for **me_quantiles()** and **he_quantileIndexing()** to return the index number of the quantiles.
Parameters
----------
data : pandas series with numeric values
k : number of quantiles
method : optional which method to use to calculate quartiles
Returns
-------
indexes : the indexes of the quantiles
Notes
-----
Six alternatives for the indexing is:
Most basic (**SAS1**):
$$iQ_i = n\\times p_i$$
**SAS4** method uses for indexing (SAS, 1990, p. 626; Snedecor, 1940, p. 43):
$$iQ_i = \\left(n + 1\\right)\\times p_i$$
**Hog and Ledolter** use for their indexing (Hogg & Ledolter, 1992, p. 21; Hazen, 1914, p. ?):
$$iQ_i = n\\times p_i + \\frac{1}{2}$$
**MS Excel** uses for indexing (Gumbel, 1939, p. ?; Hyndman & Fan, 1996, p. 363):
$$iQ_i = \\left(n - 1\\right)\\times p_i + 1$$
**Hyndman and Fan** use for their 8th version (Hyndman & Fan, 1996, p. 363):
$$iQ_i = \\left(n + \\frac{1}{3}\\right)\\times p_i + \\frac{1}{3}$$
**Hyndman and Fan** use for their 9th version (Hyndman & Fan, 1996, p. 364):
$$iQ_i = \\left(n + \\frac{1}{4}\\right)\\times p_i + \\frac{3}{8}$$
References
----------
Gumbel, E. J. (1939). La Probabilité des Hypothèses. Compes Rendus de l’ Académie des Sciences, 209, 645–647.
Hazen, A. (1914). Storage to be provided in impounding municipal water supply. Transactions of the American Society of Civil Engineers, 77(1), 1539–1640. https://doi.org/10.1061/taceat.0002563
Hogg, R. V., & Ledolter, J. (1992). Applied statistics for engineers and physical scientists (2nd int.). Macmillan.
Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361–365. https://doi.org/10.2307/2684934
SAS. (1990). SAS procedures guide: Version 6 (3rd ed.). SAS Institute.
Snedecor, G. W. (1940). Statistical methods applied to experiments in agriculture and biology (3rd ed.). The Iowa State College Press.
Author
------
Made by P. Stikker
Please visit: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
'''
props = 1/k
n = len(data)
indexes = pd.Series(dtype='float64')
for i in range(k+1):
indexes.at[i] = i*props
if method=="sas1":
indexes = n*indexes
elif method=="sas4":
indexes = (n + 1)*indexes
elif method=="hl":
indexes = n*indexes + 1/2
elif method=="excel":
indexes = (n - 1)*indexes + 1
elif method=="hf8":
indexes = (n + 1/3)*indexes + 1/3
elif method=="hf9":
indexes = (n + 1/4)*indexes + 3/8
#adjust for min and maximum
for i in range(k+1):
if indexes.at[i] < 1:
indexes.at[i] = 1
elif indexes.at[i] > n:
indexes.at[i] = n
return indexes
Functions
def he_quantileIndexing(data, k=4, method='sas1')
-
Quantile Indexing
Helper function for me_quantiles() and he_quantileIndexing() to return the index number of the quantiles.
Parameters
data
:pandas series with numeric values
k
:number
ofquantiles
method
:optional which method to use to calculate quartiles
Returns
indexes
:the indexes
ofthe quantiles
Notes
Six alternatives for the indexing is:
Most basic (SAS1): iQ_i = n\times p_i
SAS4 method uses for indexing (SAS, 1990, p. 626; Snedecor, 1940, p. 43): iQ_i = \left(n + 1\right)\times p_i
Hog and Ledolter use for their indexing (Hogg & Ledolter, 1992, p. 21; Hazen, 1914, p. ?):
iQ_i = n\times p_i + \frac{1}{2}
MS Excel uses for indexing (Gumbel, 1939, p. ?; Hyndman & Fan, 1996, p. 363): iQ_i = \left(n - 1\right)\times p_i + 1
Hyndman and Fan use for their 8th version (Hyndman & Fan, 1996, p. 363): iQ_i = \left(n + \frac{1}{3}\right)\times p_i + \frac{1}{3}
Hyndman and Fan use for their 9th version (Hyndman & Fan, 1996, p. 364): iQ_i = \left(n + \frac{1}{4}\right)\times p_i + \frac{3}{8}
References
Gumbel, E. J. (1939). La Probabilité des Hypothèses. Compes Rendus de l’ Académie des Sciences, 209, 645–647.
Hazen, A. (1914). Storage to be provided in impounding municipal water supply. Transactions of the American Society of Civil Engineers, 77(1), 1539–1640. https://doi.org/10.1061/taceat.0002563
Hogg, R. V., & Ledolter, J. (1992). Applied statistics for engineers and physical scientists (2nd int.). Macmillan.
Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361–365. https://doi.org/10.2307/2684934
SAS. (1990). SAS procedures guide: Version 6 (3rd ed.). SAS Institute.
Snedecor, G. W. (1940). Statistical methods applied to experiments in agriculture and biology (3rd ed.). The Iowa State College Press.
Author
Made by P. Stikker
Please visit: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Expand source code
def he_quantileIndexing(data, k=4, method="sas1"): ''' Quantile Indexing Helper function for **me_quantiles()** and **he_quantileIndexing()** to return the index number of the quantiles. Parameters ---------- data : pandas series with numeric values k : number of quantiles method : optional which method to use to calculate quartiles Returns ------- indexes : the indexes of the quantiles Notes ----- Six alternatives for the indexing is: Most basic (**SAS1**): $$iQ_i = n\\times p_i$$ **SAS4** method uses for indexing (SAS, 1990, p. 626; Snedecor, 1940, p. 43): $$iQ_i = \\left(n + 1\\right)\\times p_i$$ **Hog and Ledolter** use for their indexing (Hogg & Ledolter, 1992, p. 21; Hazen, 1914, p. ?): $$iQ_i = n\\times p_i + \\frac{1}{2}$$ **MS Excel** uses for indexing (Gumbel, 1939, p. ?; Hyndman & Fan, 1996, p. 363): $$iQ_i = \\left(n - 1\\right)\\times p_i + 1$$ **Hyndman and Fan** use for their 8th version (Hyndman & Fan, 1996, p. 363): $$iQ_i = \\left(n + \\frac{1}{3}\\right)\\times p_i + \\frac{1}{3}$$ **Hyndman and Fan** use for their 9th version (Hyndman & Fan, 1996, p. 364): $$iQ_i = \\left(n + \\frac{1}{4}\\right)\\times p_i + \\frac{3}{8}$$ References ---------- Gumbel, E. J. (1939). La Probabilité des Hypothèses. Compes Rendus de l’ Académie des Sciences, 209, 645–647. Hazen, A. (1914). Storage to be provided in impounding municipal water supply. Transactions of the American Society of Civil Engineers, 77(1), 1539–1640. https://doi.org/10.1061/taceat.0002563 Hogg, R. V., & Ledolter, J. (1992). Applied statistics for engineers and physical scientists (2nd int.). Macmillan. Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. The American Statistician, 50(4), 361–365. https://doi.org/10.2307/2684934 SAS. (1990). SAS procedures guide: Version 6 (3rd ed.). SAS Institute. Snedecor, G. W. (1940). Statistical methods applied to experiments in agriculture and biology (3rd ed.). The Iowa State College Press. Author ------ Made by P. Stikker Please visit: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet ''' props = 1/k n = len(data) indexes = pd.Series(dtype='float64') for i in range(k+1): indexes.at[i] = i*props if method=="sas1": indexes = n*indexes elif method=="sas4": indexes = (n + 1)*indexes elif method=="hl": indexes = n*indexes + 1/2 elif method=="excel": indexes = (n - 1)*indexes + 1 elif method=="hf8": indexes = (n + 1/3)*indexes + 1/3 elif method=="hf9": indexes = (n + 1/4)*indexes + 3/8 #adjust for min and maximum for i in range(k+1): if indexes.at[i] < 1: indexes.at[i] = 1 elif indexes.at[i] > n: indexes.at[i] = n return indexes