Module stikpetP.helper.help_quantileIndex
Expand source code
import pandas as pd
import math
from .help_quantileIndexing import he_quantileIndexing
def he_quantileIndex(data, k=4, indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int"):
'''
Quantile Numeric Based on Index
Helper function for **me_quantiles()** to return the quantiles with different methods of rounding.
Parameters
----------
data : pandas series with numeric values
k : number of quantiles
indexMethod : optional to indicate which type of indexing to use
qLfrac : optional to indicate what type of rounding to use for quantiles below 50 percent
qLint : optional to indicate the use of the integer or the midpoint method to use for quantiles below 50 percent
qHfrac : optional to indicate what type of rounding to use for quantiles equal or above 50 percent
qHint : optional to indicate the use of the integer or the midpoint method to use for quantiles equal or above 50 percent
qLfrac and qHfrac can be set to: "linear", "down", "up", "bankers", "nearest", "halfdown", or "midpoint".
qLint and qHint can be set to: "int" or "midpoint".
Returns
-------
quantiles : the quantiles as a number
Notes
-----
If **the index is an integer** often that integer will be used to find the corresponding value in the sorted data. This can be used by setting *qLint* and/or *qHint* to **int**.
However, in some rare methods they argue to take the midpoint between the found index and the next one, i.e. to use:
$$iQ_i = iQ_i + \\frac{1}{2}$$
This can be done by setting *qLint* and/or *qHint* to **midpoint**.
If the index has a fractional part, we could use linear interpolation. It can be written as:
$$X\\left[\\lfloor iQ_i \\rfloor\\right] + \\frac{iQ_i - \\lfloor iQ_i \\rfloor}{\\lceil iQ_i \\rceil - \\lfloor iQ_i \\rfloor} \\times \\left(X\\left[\\lceil iQ_i \\rceil\\right] - X\\left[\\lfloor iQ_i \\rfloor\\right]\\right)$$
Where:
* \\(X\\left[x\\right]\\) is the x-th score of the sorted scores
* \\(\\lfloor\\dots\\rfloor\\) is the function to always round down
* \\(\\lceil\\dots\\rceil\\) is the function to always round up
Or we can use 'rounding'. But there are different versions of rounding. Besides the already mentioned round down (use *qLfrac* and/or *qHfrac* as **down**) and round up versions (use *qLfrac* and/or *qHfrac* as **up**):
* \\(\\lfloor\\dots\\rceil\\) to indicate rounding to the nearest even integer. A value of 2.5 gets rounded to 2, while 1.5 also gets rounded to 2. This is also referred to as *bankers* method. Use *qLfrac* and/or *qHfrac* as **bankers**.
* \\(\\left[\\dots\\right]\\) to indicate rounding to the nearest integer. A value that ends with .5 is then always rounded up. Use *qLfrac* and/or *qHfrac* as **nearest**.
* \\(\\left< \\dots\\right>\\) to indicate to round a value ending with .5 always down. Use *qLfrac* and/or *qHfrac* as **halfdown**.
or even use the midpoint again i.e.:
$$\\frac{\\lfloor iQ_i \\rfloor + \\lceil iQ_i \\rceil}{2}$$
Use *qLfrac* and/or *qHfrac* as **midpoint**.
Author
------
Made by P. Stikker
Please visit: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
'''
n = len(data)
iqs = he_quantileIndexing(data, k=k, method=indexMethod)
quantiles = pd.Series(dtype='float64')
for i in range(k+1):
if iqs[i] < 0.5:
if round(iqs[i]) == iqs[i]:
# index is integer
if qLint == "int":
quantiles.at[i] = iqs[i]
elif qLint == "midpoint":
quantiles.at[i] = iqs[i] + 1/2
else:
# index has fraction
if qLfrac == "linear":
quantiles.at[i] = iqs[i]
elif qLfrac == "down":
quantiles.at[i] = math.floor(iqs[i])
elif qLfrac == "up":
quantiles.at[i] = math.ceil(iqs[i])
elif qLfrac == "bankers":
quantiles.at[i] = round(iqs[i])
elif qLfrac == "nearest":
quantiles.at[i] = int(iqs[i] + 0.5)
elif qLfrac == "halfdown":
if iqs[i] + 0.5 == round(iqs[i] + 0.5):
quantiles.at[i] = math.floor(iqs[i])
else:
quantiles.at[i] = round(iqs[i])
elif qLfrac == "midpoint":
quantiles.at[i] = (math.floor(iqs[i]) + math.ceil(iqs[i])) / 2
elif iqs[i] >= 0.5:
if round(iqs[i]) == iqs[i]:
# index is integer
if qHint == "int":
quantiles.at[i] = iqs[i]
elif qHint == "midpoint":
quantiles.at[i] = iqs[i] + 1/2
else:
# index has fraction
if qHfrac == "linear":
quantiles.at[i] = iqs[i]
elif qHfrac == "down":
quantiles.at[i] = math.floor(iqs[i])
elif qHfrac == "up":
quantiles.at[i] = math.ceil(iqs[i])
elif qHfrac == "bankers":
quantiles.at[i] = round(iqs[i])
elif qHfrac == "nearest":
quantiles.at[i] = int(iqs[i] + 0.5)
elif qHfrac == "halfdown":
if iqs[i] + 0.5 == round(iqs[i] + 0.5):
quantiles.at[i] = math.floor(iqs[i])
else:
quantiles.at[i] = round(iqs[i])
elif qHfrac == "midpoint":
quantiles.at[i] = (math.floor(iqs[i]) + math.ceil(iqs[i])) / 2
if quantiles.at[i] < 1:
quantiles.at[i] = 1
elif quantiles.at[i] > n:
quantiles.at[i] = n
qi = quantiles.at[i]
qLow = math.floor(qi)
qHigh = math.ceil(qi)
if qLow==qHigh:
quantiles.at[i] = data[int(qLow-1)]
else:
#Linear interpolation:
quantiles.at[i] = data[int(qLow-1)] + (qi - qLow)/(qHigh - qLow)*(data[int(qHigh-1)] - data[int(qLow-1)])
return quantiles
Functions
def he_quantileIndex(data, k=4, indexMethod='sas1', qLfrac='linear', qLint='int', qHfrac='linear', qHint='int')
-
Quantile Numeric Based on Index
Helper function for me_quantiles() to return the quantiles with different methods of rounding.
Parameters
data
:pandas series with numeric values
k
:number
ofquantiles
indexMethod
:optional to indicate which type
ofindexing to use
qLfrac
:optional to indicate what type
ofrounding to use for quantiles below 50 percent
qLint
:optional to indicate the use
ofthe integer
orthe midpoint method to use for quantiles below 50 percent
qHfrac
:optional to indicate what type
ofrounding to use for quantiles equal
orabove 50 percent
qHint
:optional to indicate the use
ofthe integer
orthe midpoint method to use for quantiles equal
orabove 50 percent
qLfrac and qHfrac can be set to: "linear", "down", "up", "bankers", "nearest", "halfdown", or "midpoint".
qLint and qHint can be set to: "int" or "midpoint".
Returns
quantiles
:the quantiles as a number
Notes
If the index is an integer often that integer will be used to find the corresponding value in the sorted data. This can be used by setting qLint and/or qHint to int.
However, in some rare methods they argue to take the midpoint between the found index and the next one, i.e. to use:iQ_i = iQ_i + \frac{1}{2} This can be done by setting qLint and/or qHint to midpoint.
If the index has a fractional part, we could use linear interpolation. It can be written as:
X\left[\lfloor iQ_i \rfloor\right] + \frac{iQ_i - \lfloor iQ_i \rfloor}{\lceil iQ_i \rceil - \lfloor iQ_i \rfloor} \times \left(X\left[\lceil iQ_i \rceil\right] - X\left[\lfloor iQ_i \rfloor\right]\right)
Where: * X\left[x\right] is the x-th score of the sorted scores * \lfloor\dots\rfloor is the function to always round down * \lceil\dots\rceil is the function to always round up
Or we can use 'rounding'. But there are different versions of rounding. Besides the already mentioned round down (use qLfrac and/or qHfrac as down) and round up versions (use qLfrac and/or qHfrac as up):
- \lfloor\dots\rceil to indicate rounding to the nearest even integer. A value of 2.5 gets rounded to 2, while 1.5 also gets rounded to 2. This is also referred to as bankers method. Use qLfrac and/or qHfrac as bankers.
- \left[\dots\right] to indicate rounding to the nearest integer. A value that ends with .5 is then always rounded up. Use qLfrac and/or qHfrac as nearest.
- \left< \dots\right> to indicate to round a value ending with .5 always down. Use qLfrac and/or qHfrac as halfdown.
or even use the midpoint again i.e.:
\frac{\lfloor iQ_i \rfloor + \lceil iQ_i \rceil}{2}
Use qLfrac and/or qHfrac as midpoint.
Author
Made by P. Stikker
Please visit: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Expand source code
def he_quantileIndex(data, k=4, indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int"): ''' Quantile Numeric Based on Index Helper function for **me_quantiles()** to return the quantiles with different methods of rounding. Parameters ---------- data : pandas series with numeric values k : number of quantiles indexMethod : optional to indicate which type of indexing to use qLfrac : optional to indicate what type of rounding to use for quantiles below 50 percent qLint : optional to indicate the use of the integer or the midpoint method to use for quantiles below 50 percent qHfrac : optional to indicate what type of rounding to use for quantiles equal or above 50 percent qHint : optional to indicate the use of the integer or the midpoint method to use for quantiles equal or above 50 percent qLfrac and qHfrac can be set to: "linear", "down", "up", "bankers", "nearest", "halfdown", or "midpoint". qLint and qHint can be set to: "int" or "midpoint". Returns ------- quantiles : the quantiles as a number Notes ----- If **the index is an integer** often that integer will be used to find the corresponding value in the sorted data. This can be used by setting *qLint* and/or *qHint* to **int**. However, in some rare methods they argue to take the midpoint between the found index and the next one, i.e. to use: $$iQ_i = iQ_i + \\frac{1}{2}$$ This can be done by setting *qLint* and/or *qHint* to **midpoint**. If the index has a fractional part, we could use linear interpolation. It can be written as: $$X\\left[\\lfloor iQ_i \\rfloor\\right] + \\frac{iQ_i - \\lfloor iQ_i \\rfloor}{\\lceil iQ_i \\rceil - \\lfloor iQ_i \\rfloor} \\times \\left(X\\left[\\lceil iQ_i \\rceil\\right] - X\\left[\\lfloor iQ_i \\rfloor\\right]\\right)$$ Where: * \\(X\\left[x\\right]\\) is the x-th score of the sorted scores * \\(\\lfloor\\dots\\rfloor\\) is the function to always round down * \\(\\lceil\\dots\\rceil\\) is the function to always round up Or we can use 'rounding'. But there are different versions of rounding. Besides the already mentioned round down (use *qLfrac* and/or *qHfrac* as **down**) and round up versions (use *qLfrac* and/or *qHfrac* as **up**): * \\(\\lfloor\\dots\\rceil\\) to indicate rounding to the nearest even integer. A value of 2.5 gets rounded to 2, while 1.5 also gets rounded to 2. This is also referred to as *bankers* method. Use *qLfrac* and/or *qHfrac* as **bankers**. * \\(\\left[\\dots\\right]\\) to indicate rounding to the nearest integer. A value that ends with .5 is then always rounded up. Use *qLfrac* and/or *qHfrac* as **nearest**. * \\(\\left< \\dots\\right>\\) to indicate to round a value ending with .5 always down. Use *qLfrac* and/or *qHfrac* as **halfdown**. or even use the midpoint again i.e.: $$\\frac{\\lfloor iQ_i \\rfloor + \\lceil iQ_i \\rceil}{2}$$ Use *qLfrac* and/or *qHfrac* as **midpoint**. Author ------ Made by P. Stikker Please visit: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet ''' n = len(data) iqs = he_quantileIndexing(data, k=k, method=indexMethod) quantiles = pd.Series(dtype='float64') for i in range(k+1): if iqs[i] < 0.5: if round(iqs[i]) == iqs[i]: # index is integer if qLint == "int": quantiles.at[i] = iqs[i] elif qLint == "midpoint": quantiles.at[i] = iqs[i] + 1/2 else: # index has fraction if qLfrac == "linear": quantiles.at[i] = iqs[i] elif qLfrac == "down": quantiles.at[i] = math.floor(iqs[i]) elif qLfrac == "up": quantiles.at[i] = math.ceil(iqs[i]) elif qLfrac == "bankers": quantiles.at[i] = round(iqs[i]) elif qLfrac == "nearest": quantiles.at[i] = int(iqs[i] + 0.5) elif qLfrac == "halfdown": if iqs[i] + 0.5 == round(iqs[i] + 0.5): quantiles.at[i] = math.floor(iqs[i]) else: quantiles.at[i] = round(iqs[i]) elif qLfrac == "midpoint": quantiles.at[i] = (math.floor(iqs[i]) + math.ceil(iqs[i])) / 2 elif iqs[i] >= 0.5: if round(iqs[i]) == iqs[i]: # index is integer if qHint == "int": quantiles.at[i] = iqs[i] elif qHint == "midpoint": quantiles.at[i] = iqs[i] + 1/2 else: # index has fraction if qHfrac == "linear": quantiles.at[i] = iqs[i] elif qHfrac == "down": quantiles.at[i] = math.floor(iqs[i]) elif qHfrac == "up": quantiles.at[i] = math.ceil(iqs[i]) elif qHfrac == "bankers": quantiles.at[i] = round(iqs[i]) elif qHfrac == "nearest": quantiles.at[i] = int(iqs[i] + 0.5) elif qHfrac == "halfdown": if iqs[i] + 0.5 == round(iqs[i] + 0.5): quantiles.at[i] = math.floor(iqs[i]) else: quantiles.at[i] = round(iqs[i]) elif qHfrac == "midpoint": quantiles.at[i] = (math.floor(iqs[i]) + math.ceil(iqs[i])) / 2 if quantiles.at[i] < 1: quantiles.at[i] = 1 elif quantiles.at[i] > n: quantiles.at[i] = n qi = quantiles.at[i] qLow = math.floor(qi) qHigh = math.ceil(qi) if qLow==qHigh: quantiles.at[i] = data[int(qLow-1)] else: #Linear interpolation: quantiles.at[i] = data[int(qLow-1)] + (qi - qLow)/(qHigh - qLow)*(data[int(qHigh-1)] - data[int(qLow-1)]) return quantiles