Module stikpetP.helper.help_quantileIndex

Expand source code
import pandas as pd
import math
from .help_quantileIndexing import he_quantileIndexing

def he_quantileIndex(data, k=4, indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int"):
    '''
    Quantile Numeric Based on Index
    
    Helper function for **me_quantiles()** to return the quantiles with different methods of rounding.
    
    Parameters
    ----------
    data : pandas series with numeric values
    k : number of quantiles
    indexMethod : optional to indicate which type of indexing to use
    qLfrac : optional to indicate what type of rounding to use for quantiles below 50 percent
    qLint : optional to indicate the use of the integer or the midpoint method to use for quantiles below 50 percent
    qHfrac : optional to indicate what type of rounding to use for quantiles equal or above 50 percent
    qHint : optional to indicate the use of the integer or the midpoint method to use for quantiles equal or above 50 percent
    
    qLfrac and qHfrac can be set to: "linear", "down", "up", "bankers", "nearest", "halfdown", or "midpoint".
    
    qLint and qHint can be set to: "int" or "midpoint".
    
    Returns
    -------
    quantiles : the quantiles as a number
    
    Notes
    -----
    If **the index is an integer** often that integer will be used to find the corresponding value in the sorted data. This can be used by setting *qLint* and/or *qHint* to **int**.   
    However, in some rare methods they argue to take the midpoint between the found index and the next one, i.e. to use:

    $$iQ_i = iQ_i + \\frac{1}{2}$$
    This can be done by setting *qLint* and/or *qHint* to **midpoint**.
    
    If the index has a fractional part, we could use linear interpolation. It can be written as:

    $$X\\left[\\lfloor iQ_i \\rfloor\\right] + \\frac{iQ_i - \\lfloor iQ_i \\rfloor}{\\lceil iQ_i \\rceil - \\lfloor iQ_i \\rfloor} \\times \\left(X\\left[\\lceil iQ_i \\rceil\\right] - X\\left[\\lfloor iQ_i \\rfloor\\right]\\right)$$

    Where:
    * \\(X\\left[x\\right]\\) is the x-th score of the sorted scores 
    * \\(\\lfloor\\dots\\rfloor\\) is the function to always round down
    * \\(\\lceil\\dots\\rceil\\) is the function to always round up

    Or we can use 'rounding'. But there are different versions of rounding. Besides the already mentioned round down (use *qLfrac* and/or *qHfrac* as **down**) and round up versions (use *qLfrac* and/or *qHfrac* as **up**):

    * \\(\\lfloor\\dots\\rceil\\) to indicate rounding to the nearest even integer. A value of 2.5 gets rounded to 2, while 1.5 also gets rounded to 2. This is also referred to as *bankers* method. Use *qLfrac* and/or *qHfrac* as **bankers**.
    * \\(\\left[\\dots\\right]\\) to indicate rounding to the nearest integer. A value that ends with .5 is then always rounded up. Use *qLfrac* and/or *qHfrac* as **nearest**.
    * \\(\\left< \\dots\\right>\\) to indicate to round a value ending with .5 always down. Use *qLfrac* and/or *qHfrac* as **halfdown**.

    or even use the midpoint again i.e.:

    $$\\frac{\\lfloor iQ_i \\rfloor + \\lceil iQ_i \\rceil}{2}$$
    
    Use *qLfrac* and/or *qHfrac* as **midpoint**.
    
    Author
    ------
    Made by P. Stikker
    
    Please visit: https://PeterStatistics.com
    
    YouTube channel: https://www.youtube.com/stikpet
    
    '''
    n = len(data)
    iqs = he_quantileIndexing(data, k=k, method=indexMethod)
    quantiles = pd.Series(dtype='float64')
    
    for i in range(k+1):
        if iqs[i] < 0.5:
        
            if round(iqs[i]) == iqs[i]:
                # index is integer
                if qLint == "int":
                    quantiles.at[i] = iqs[i]
                elif qLint == "midpoint":
                    quantiles.at[i] = iqs[i] + 1/2
            else:
                # index has fraction
                if qLfrac == "linear":
                    quantiles.at[i] = iqs[i]
                elif qLfrac == "down":
                    quantiles.at[i] = math.floor(iqs[i])
                elif qLfrac == "up":
                    quantiles.at[i] = math.ceil(iqs[i])
                elif qLfrac == "bankers":
                    quantiles.at[i] = round(iqs[i])
                elif qLfrac == "nearest":
                    quantiles.at[i] = int(iqs[i] + 0.5)
                elif qLfrac == "halfdown":
                    if iqs[i] + 0.5 == round(iqs[i] + 0.5):
                        quantiles.at[i] = math.floor(iqs[i])
                    else:
                        quantiles.at[i] = round(iqs[i])
                elif qLfrac == "midpoint":
                    quantiles.at[i] = (math.floor(iqs[i]) + math.ceil(iqs[i])) / 2
       
        elif iqs[i] >= 0.5:
        
            if round(iqs[i]) == iqs[i]:
                # index is integer
                if qHint == "int":
                    quantiles.at[i] = iqs[i]
                elif qHint == "midpoint":
                    quantiles.at[i] = iqs[i] + 1/2
            else:
                # index has fraction
                if qHfrac == "linear":
                    quantiles.at[i] = iqs[i]
                elif qHfrac == "down":
                    quantiles.at[i] = math.floor(iqs[i])
                elif qHfrac == "up":
                    quantiles.at[i] = math.ceil(iqs[i])
                elif qHfrac == "bankers":
                    quantiles.at[i] = round(iqs[i])
                elif qHfrac == "nearest":
                    quantiles.at[i] = int(iqs[i] + 0.5)
                elif qHfrac == "halfdown":
                    if iqs[i] + 0.5 == round(iqs[i] + 0.5):
                        quantiles.at[i] = math.floor(iqs[i])
                    else:
                        quantiles.at[i] = round(iqs[i])
                elif qHfrac == "midpoint":
                    quantiles.at[i] = (math.floor(iqs[i]) + math.ceil(iqs[i])) / 2
        
        if quantiles.at[i] < 1:
            quantiles.at[i] = 1
        elif quantiles.at[i] > n:
            quantiles.at[i] = n
                
        qi = quantiles.at[i]
        qLow = math.floor(qi)
        qHigh = math.ceil(qi)

        if qLow==qHigh:
            quantiles.at[i] = data[int(qLow-1)]
        else:
            #Linear interpolation:
            quantiles.at[i] = data[int(qLow-1)] + (qi - qLow)/(qHigh - qLow)*(data[int(qHigh-1)] - data[int(qLow-1)])

    return quantiles

Functions

def he_quantileIndex(data, k=4, indexMethod='sas1', qLfrac='linear', qLint='int', qHfrac='linear', qHint='int')

Quantile Numeric Based on Index

Helper function for me_quantiles() to return the quantiles with different methods of rounding.

Parameters

data : pandas series with numeric values
 
k : number of quantiles
 
indexMethod : optional to indicate which type of indexing to use
 
qLfrac : optional to indicate what type of rounding to use for quantiles below 50 percent
 
qLint : optional to indicate the use of the integer or the midpoint method to use for quantiles below 50 percent
 
qHfrac : optional to indicate what type of rounding to use for quantiles equal or above 50 percent
 
qHint : optional to indicate the use of the integer or the midpoint method to use for quantiles equal or above 50 percent
 

qLfrac and qHfrac can be set to: "linear", "down", "up", "bankers", "nearest", "halfdown", or "midpoint".

qLint and qHint can be set to: "int" or "midpoint".

Returns

quantiles : the quantiles as a number
 

Notes

If the index is an integer often that integer will be used to find the corresponding value in the sorted data. This can be used by setting qLint and/or qHint to int.
However, in some rare methods they argue to take the midpoint between the found index and the next one, i.e. to use:

iQ_i = iQ_i + \frac{1}{2} This can be done by setting qLint and/or qHint to midpoint.

If the index has a fractional part, we could use linear interpolation. It can be written as:

X\left[\lfloor iQ_i \rfloor\right] + \frac{iQ_i - \lfloor iQ_i \rfloor}{\lceil iQ_i \rceil - \lfloor iQ_i \rfloor} \times \left(X\left[\lceil iQ_i \rceil\right] - X\left[\lfloor iQ_i \rfloor\right]\right)

Where: * X\left[x\right] is the x-th score of the sorted scores * \lfloor\dots\rfloor is the function to always round down * \lceil\dots\rceil is the function to always round up

Or we can use 'rounding'. But there are different versions of rounding. Besides the already mentioned round down (use qLfrac and/or qHfrac as down) and round up versions (use qLfrac and/or qHfrac as up):

  • \lfloor\dots\rceil to indicate rounding to the nearest even integer. A value of 2.5 gets rounded to 2, while 1.5 also gets rounded to 2. This is also referred to as bankers method. Use qLfrac and/or qHfrac as bankers.
  • \left[\dots\right] to indicate rounding to the nearest integer. A value that ends with .5 is then always rounded up. Use qLfrac and/or qHfrac as nearest.
  • \left< \dots\right> to indicate to round a value ending with .5 always down. Use qLfrac and/or qHfrac as halfdown.

or even use the midpoint again i.e.:

\frac{\lfloor iQ_i \rfloor + \lceil iQ_i \rceil}{2}

Use qLfrac and/or qHfrac as midpoint.

Author

Made by P. Stikker

Please visit: https://PeterStatistics.com

YouTube channel: https://www.youtube.com/stikpet

Expand source code
def he_quantileIndex(data, k=4, indexMethod="sas1", qLfrac="linear", qLint="int", qHfrac="linear", qHint="int"):
    '''
    Quantile Numeric Based on Index
    
    Helper function for **me_quantiles()** to return the quantiles with different methods of rounding.
    
    Parameters
    ----------
    data : pandas series with numeric values
    k : number of quantiles
    indexMethod : optional to indicate which type of indexing to use
    qLfrac : optional to indicate what type of rounding to use for quantiles below 50 percent
    qLint : optional to indicate the use of the integer or the midpoint method to use for quantiles below 50 percent
    qHfrac : optional to indicate what type of rounding to use for quantiles equal or above 50 percent
    qHint : optional to indicate the use of the integer or the midpoint method to use for quantiles equal or above 50 percent
    
    qLfrac and qHfrac can be set to: "linear", "down", "up", "bankers", "nearest", "halfdown", or "midpoint".
    
    qLint and qHint can be set to: "int" or "midpoint".
    
    Returns
    -------
    quantiles : the quantiles as a number
    
    Notes
    -----
    If **the index is an integer** often that integer will be used to find the corresponding value in the sorted data. This can be used by setting *qLint* and/or *qHint* to **int**.   
    However, in some rare methods they argue to take the midpoint between the found index and the next one, i.e. to use:

    $$iQ_i = iQ_i + \\frac{1}{2}$$
    This can be done by setting *qLint* and/or *qHint* to **midpoint**.
    
    If the index has a fractional part, we could use linear interpolation. It can be written as:

    $$X\\left[\\lfloor iQ_i \\rfloor\\right] + \\frac{iQ_i - \\lfloor iQ_i \\rfloor}{\\lceil iQ_i \\rceil - \\lfloor iQ_i \\rfloor} \\times \\left(X\\left[\\lceil iQ_i \\rceil\\right] - X\\left[\\lfloor iQ_i \\rfloor\\right]\\right)$$

    Where:
    * \\(X\\left[x\\right]\\) is the x-th score of the sorted scores 
    * \\(\\lfloor\\dots\\rfloor\\) is the function to always round down
    * \\(\\lceil\\dots\\rceil\\) is the function to always round up

    Or we can use 'rounding'. But there are different versions of rounding. Besides the already mentioned round down (use *qLfrac* and/or *qHfrac* as **down**) and round up versions (use *qLfrac* and/or *qHfrac* as **up**):

    * \\(\\lfloor\\dots\\rceil\\) to indicate rounding to the nearest even integer. A value of 2.5 gets rounded to 2, while 1.5 also gets rounded to 2. This is also referred to as *bankers* method. Use *qLfrac* and/or *qHfrac* as **bankers**.
    * \\(\\left[\\dots\\right]\\) to indicate rounding to the nearest integer. A value that ends with .5 is then always rounded up. Use *qLfrac* and/or *qHfrac* as **nearest**.
    * \\(\\left< \\dots\\right>\\) to indicate to round a value ending with .5 always down. Use *qLfrac* and/or *qHfrac* as **halfdown**.

    or even use the midpoint again i.e.:

    $$\\frac{\\lfloor iQ_i \\rfloor + \\lceil iQ_i \\rceil}{2}$$
    
    Use *qLfrac* and/or *qHfrac* as **midpoint**.
    
    Author
    ------
    Made by P. Stikker
    
    Please visit: https://PeterStatistics.com
    
    YouTube channel: https://www.youtube.com/stikpet
    
    '''
    n = len(data)
    iqs = he_quantileIndexing(data, k=k, method=indexMethod)
    quantiles = pd.Series(dtype='float64')
    
    for i in range(k+1):
        if iqs[i] < 0.5:
        
            if round(iqs[i]) == iqs[i]:
                # index is integer
                if qLint == "int":
                    quantiles.at[i] = iqs[i]
                elif qLint == "midpoint":
                    quantiles.at[i] = iqs[i] + 1/2
            else:
                # index has fraction
                if qLfrac == "linear":
                    quantiles.at[i] = iqs[i]
                elif qLfrac == "down":
                    quantiles.at[i] = math.floor(iqs[i])
                elif qLfrac == "up":
                    quantiles.at[i] = math.ceil(iqs[i])
                elif qLfrac == "bankers":
                    quantiles.at[i] = round(iqs[i])
                elif qLfrac == "nearest":
                    quantiles.at[i] = int(iqs[i] + 0.5)
                elif qLfrac == "halfdown":
                    if iqs[i] + 0.5 == round(iqs[i] + 0.5):
                        quantiles.at[i] = math.floor(iqs[i])
                    else:
                        quantiles.at[i] = round(iqs[i])
                elif qLfrac == "midpoint":
                    quantiles.at[i] = (math.floor(iqs[i]) + math.ceil(iqs[i])) / 2
       
        elif iqs[i] >= 0.5:
        
            if round(iqs[i]) == iqs[i]:
                # index is integer
                if qHint == "int":
                    quantiles.at[i] = iqs[i]
                elif qHint == "midpoint":
                    quantiles.at[i] = iqs[i] + 1/2
            else:
                # index has fraction
                if qHfrac == "linear":
                    quantiles.at[i] = iqs[i]
                elif qHfrac == "down":
                    quantiles.at[i] = math.floor(iqs[i])
                elif qHfrac == "up":
                    quantiles.at[i] = math.ceil(iqs[i])
                elif qHfrac == "bankers":
                    quantiles.at[i] = round(iqs[i])
                elif qHfrac == "nearest":
                    quantiles.at[i] = int(iqs[i] + 0.5)
                elif qHfrac == "halfdown":
                    if iqs[i] + 0.5 == round(iqs[i] + 0.5):
                        quantiles.at[i] = math.floor(iqs[i])
                    else:
                        quantiles.at[i] = round(iqs[i])
                elif qHfrac == "midpoint":
                    quantiles.at[i] = (math.floor(iqs[i]) + math.ceil(iqs[i])) / 2
        
        if quantiles.at[i] < 1:
            quantiles.at[i] = 1
        elif quantiles.at[i] > n:
            quantiles.at[i] = n
                
        qi = quantiles.at[i]
        qLow = math.floor(qi)
        qHigh = math.ceil(qi)

        if qLow==qHigh:
            quantiles.at[i] = data[int(qLow-1)]
        else:
            #Linear interpolation:
            quantiles.at[i] = data[int(qLow-1)] + (qi - qLow)/(qHigh - qLow)*(data[int(qHigh-1)] - data[int(qLow-1)])

    return quantiles