Module stikpetP.measures.meas_hodges_lehmann_os

Expand source code
import pandas as pd
from statistics import median

def me_hodges_lehmann_os(scores, levels=None):
    '''
    Hodges-Lehmann Estimate (One-Sample)
    ----------------------------------------
    The Hodges-Lehmann Estimate (Hodges & Lehmann, 1963) for a one-sample scenario, is the median of the Walsh averages. The Walsh averages (Walsh, 1949a, 1949b) are the average of each possible pair by taking one score and combining it with each of the other scores. Note that each is only counted once, so taking the second and fifth score is the same as taking the fifth and the second, so only one of these is used. It does also include self-pairs, e.g. the third score and third score.
    
    It is in the one-sample case therefor a measure of central tendancy and sometimes referred to as the pseudo median.

    The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/HodgesLehmannOS.html)
    
    Parameters
    ----------
    scores : dataframe or list
        the scores
    levels : list or dictionary, optional
        the scores in order
    
    Returns
    -------
    HL : float
        the Hodges-Lehmann Estimate
    
    Notes
    ------
    The formula used (Hodges & Lehmann, 1963, p. 599):
    
    $$HL = \\text{median}\\left(\\frac{x_i + x_j}{2} | i \\leq i \\leq j \\leq n\\right)$$

    Before, After and Alternatives
    ------------------------------
    Before this measure you might want an impression using a frequency table or a visualisation:
    * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
    * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
    * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart

    After this you might want some other descriptive measures:
    * [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus
    * [me_median](../measures/meas_median.html#me_median) for the Median
    * [me_quantiles](../measures/meas_quantiles.html#me_quantiles) for Quantiles
    * [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges
    * [me_quartile_range](../measures/meas_quartile_range.html#me_quartile_range) for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
    
    or perform a test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
    
    References
    ----------
    Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. *The Annals of Mathematical Statistics, 34*(2), 598–611. doi:10.1214/aoms/1177704172
    
    Monahan, J. F. (1984). Algorithm 616: Fast computation of the Hodges-Lehmann location estimator. *ACM Transactions on Mathematical Software, 10*(3), 265–270. doi:10.1145/1271.319414
    
    Walsh, J. E. (1949a). Applications of some significance tests for the median which are valid under very general conditions. Journal of the American Statistical Association, 44(247), 342–355. doi:10.1080/01621459.1949.10483311
    
    Walsh, J. E. (1949b). Some significance tests for the median which are valid under very general conditions. *The Annals of Mathematical Statistics, 20*(1), 64–81. doi:10.1214/aoms/1177730091

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = student_df['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> me_hodges_lehmann_os(ex1, levels=order)
    np.float64(2.5)
    
    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> me_hodges_lehmann_os(ex2)
    3.5
    
    Example 3: Text data with
    >>> ex3 = ["a", "b", "f", "d", "e", "c"]
    >>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
    >>> me_hodges_lehmann_os(ex3, levels=order)
    np.float64(3.5)
    
    '''
    if type(scores) is list:
        scores = pd.Series(scores)
    
    #remove missing values
    scores = scores.dropna()

    #apply levels
    if levels is not None:
        scores = scores.map(levels).astype('Int8')
    else:
        scores = pd.to_numeric(scores)

    #sample size
    n = len(scores)

    #convert to list
    scores = list(scores)
    
    #Walsh Averages
    walsh = []
    for i in range(0, n):
        for j in range(i, n):
            walsh.append((scores[i] + scores[j])/2)

    #Hodges-Lehmann Estimate
    HL = median(walsh)
    return HL

Functions

def me_hodges_lehmann_os(scores, levels=None)

Hodges-Lehmann Estimate (One-Sample)

The Hodges-Lehmann Estimate (Hodges & Lehmann, 1963) for a one-sample scenario, is the median of the Walsh averages. The Walsh averages (Walsh, 1949a, 1949b) are the average of each possible pair by taking one score and combining it with each of the other scores. Note that each is only counted once, so taking the second and fifth score is the same as taking the fifth and the second, so only one of these is used. It does also include self-pairs, e.g. the third score and third score.

It is in the one-sample case therefor a measure of central tendancy and sometimes referred to as the pseudo median.

The measure is also described at PeterStatistics.com

Parameters

scores : dataframe or list
the scores
levels : list or dictionary, optional
the scores in order

Returns

HL : float
the Hodges-Lehmann Estimate

Notes

The formula used (Hodges & Lehmann, 1963, p. 599):

HL = \text{median}\left(\frac{x_i + x_j}{2} | i \leq i \leq j \leq n\right)

Before, After and Alternatives

Before this measure you might want an impression using a frequency table or a visualisation: * tab_frequency for a frequency table * vi_bar_stacked_single for Single Stacked Bar-Chart * vi_bar_dual_axis for Dual-Axis Bar Chart

After this you might want some other descriptive measures: * me_consensus for the Consensus * me_median for the Median * me_quantiles for Quantiles * me_quartiles for Quartiles / Hinges * me_quartile_range for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range

or perform a test: * ts_sign_os for One-Sample Sign Test * ts_trinomial_os for One-Sample Trinomial Test * ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)

References

Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. The Annals of Mathematical Statistics, 34(2), 598–611. doi:10.1214/aoms/1177704172

Monahan, J. F. (1984). Algorithm 616: Fast computation of the Hodges-Lehmann location estimator. ACM Transactions on Mathematical Software, 10(3), 265–270. doi:10.1145/1271.319414

Walsh, J. E. (1949a). Applications of some significance tests for the median which are valid under very general conditions. Journal of the American Statistical Association, 44(247), 342–355. doi:10.1080/01621459.1949.10483311

Walsh, J. E. (1949b). Some significance tests for the median which are valid under very general conditions. The Annals of Mathematical Statistics, 20(1), 64–81. doi:10.1214/aoms/1177730091

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Text Pandas Series

>>> import pandas as pd
>>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = student_df['Teach_Motivate']
>>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
>>> me_hodges_lehmann_os(ex1, levels=order)
np.float64(2.5)

Example 2: Numeric data

>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> me_hodges_lehmann_os(ex2)
3.5

Example 3: Text data with

>>> ex3 = ["a", "b", "f", "d", "e", "c"]
>>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
>>> me_hodges_lehmann_os(ex3, levels=order)
np.float64(3.5)
Expand source code
def me_hodges_lehmann_os(scores, levels=None):
    '''
    Hodges-Lehmann Estimate (One-Sample)
    ----------------------------------------
    The Hodges-Lehmann Estimate (Hodges & Lehmann, 1963) for a one-sample scenario, is the median of the Walsh averages. The Walsh averages (Walsh, 1949a, 1949b) are the average of each possible pair by taking one score and combining it with each of the other scores. Note that each is only counted once, so taking the second and fifth score is the same as taking the fifth and the second, so only one of these is used. It does also include self-pairs, e.g. the third score and third score.
    
    It is in the one-sample case therefor a measure of central tendancy and sometimes referred to as the pseudo median.

    The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/HodgesLehmannOS.html)
    
    Parameters
    ----------
    scores : dataframe or list
        the scores
    levels : list or dictionary, optional
        the scores in order
    
    Returns
    -------
    HL : float
        the Hodges-Lehmann Estimate
    
    Notes
    ------
    The formula used (Hodges & Lehmann, 1963, p. 599):
    
    $$HL = \\text{median}\\left(\\frac{x_i + x_j}{2} | i \\leq i \\leq j \\leq n\\right)$$

    Before, After and Alternatives
    ------------------------------
    Before this measure you might want an impression using a frequency table or a visualisation:
    * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
    * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
    * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart

    After this you might want some other descriptive measures:
    * [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus
    * [me_median](../measures/meas_median.html#me_median) for the Median
    * [me_quantiles](../measures/meas_quantiles.html#me_quantiles) for Quantiles
    * [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges
    * [me_quartile_range](../measures/meas_quartile_range.html#me_quartile_range) for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
    
    or perform a test:
    * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
    * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
    * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
    
    References
    ----------
    Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. *The Annals of Mathematical Statistics, 34*(2), 598–611. doi:10.1214/aoms/1177704172
    
    Monahan, J. F. (1984). Algorithm 616: Fast computation of the Hodges-Lehmann location estimator. *ACM Transactions on Mathematical Software, 10*(3), 265–270. doi:10.1145/1271.319414
    
    Walsh, J. E. (1949a). Applications of some significance tests for the median which are valid under very general conditions. Journal of the American Statistical Association, 44(247), 342–355. doi:10.1080/01621459.1949.10483311
    
    Walsh, J. E. (1949b). Some significance tests for the median which are valid under very general conditions. *The Annals of Mathematical Statistics, 20*(1), 64–81. doi:10.1214/aoms/1177730091

    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076

    Examples
    --------
    Example 1: Text Pandas Series
    >>> import pandas as pd
    >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = student_df['Teach_Motivate']
    >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
    >>> me_hodges_lehmann_os(ex1, levels=order)
    np.float64(2.5)
    
    Example 2: Numeric data
    >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
    >>> me_hodges_lehmann_os(ex2)
    3.5
    
    Example 3: Text data with
    >>> ex3 = ["a", "b", "f", "d", "e", "c"]
    >>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
    >>> me_hodges_lehmann_os(ex3, levels=order)
    np.float64(3.5)
    
    '''
    if type(scores) is list:
        scores = pd.Series(scores)
    
    #remove missing values
    scores = scores.dropna()

    #apply levels
    if levels is not None:
        scores = scores.map(levels).astype('Int8')
    else:
        scores = pd.to_numeric(scores)

    #sample size
    n = len(scores)

    #convert to list
    scores = list(scores)
    
    #Walsh Averages
    walsh = []
    for i in range(0, n):
        for j in range(i, n):
            walsh.append((scores[i] + scores[j])/2)

    #Hodges-Lehmann Estimate
    HL = median(walsh)
    return HL