Module stikpetP.measures.meas_hodges_lehmann_os
Expand source code
import pandas as pd
from statistics import median
def me_hodges_lehmann_os(scores, levels=None):
'''
Hodges-Lehmann Estimate (One-Sample)
----------------------------------------
The Hodges-Lehmann Estimate (Hodges & Lehmann, 1963) for a one-sample scenario, is the median of the Walsh averages. The Walsh averages (Walsh, 1949a, 1949b) are the average of each possible pair by taking one score and combining it with each of the other scores. Note that each is only counted once, so taking the second and fifth score is the same as taking the fifth and the second, so only one of these is used. It does also include self-pairs, e.g. the third score and third score.
It is in the one-sample case therefor a measure of central tendancy and sometimes referred to as the pseudo median.
The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/HodgesLehmannOS.html)
Parameters
----------
scores : dataframe or list
the scores
levels : list or dictionary, optional
the scores in order
Returns
-------
HL : float
the Hodges-Lehmann Estimate
Notes
------
The formula used (Hodges & Lehmann, 1963, p. 599):
$$HL = \\text{median}\\left(\\frac{x_i + x_j}{2} | i \\leq i \\leq j \\leq n\\right)$$
Before, After and Alternatives
------------------------------
Before this measure you might want an impression using a frequency table or a visualisation:
* [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table
* [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart
* [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart
After this you might want some other descriptive measures:
* [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus
* [me_median](../measures/meas_median.html#me_median) for the Median
* [me_quantiles](../measures/meas_quantiles.html#me_quantiles) for Quantiles
* [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges
* [me_quartile_range](../measures/meas_quartile_range.html#me_quartile_range) for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
or perform a test:
* [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test
* [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test
* [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample)
References
----------
Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. *The Annals of Mathematical Statistics, 34*(2), 598–611. doi:10.1214/aoms/1177704172
Monahan, J. F. (1984). Algorithm 616: Fast computation of the Hodges-Lehmann location estimator. *ACM Transactions on Mathematical Software, 10*(3), 265–270. doi:10.1145/1271.319414
Walsh, J. E. (1949a). Applications of some significance tests for the median which are valid under very general conditions. Journal of the American Statistical Association, 44(247), 342–355. doi:10.1080/01621459.1949.10483311
Walsh, J. E. (1949b). Some significance tests for the median which are valid under very general conditions. *The Annals of Mathematical Statistics, 20*(1), 64–81. doi:10.1214/aoms/1177730091
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Example 1: Text Pandas Series
>>> import pandas as pd
>>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = student_df['Teach_Motivate']
>>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5}
>>> me_hodges_lehmann_os(ex1, levels=order)
np.float64(2.5)
Example 2: Numeric data
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> me_hodges_lehmann_os(ex2)
3.5
Example 3: Text data with
>>> ex3 = ["a", "b", "f", "d", "e", "c"]
>>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6}
>>> me_hodges_lehmann_os(ex3, levels=order)
np.float64(3.5)
'''
if type(scores) is list:
scores = pd.Series(scores)
#remove missing values
scores = scores.dropna()
#apply levels
if levels is not None:
scores = scores.map(levels).astype('Int8')
else:
scores = pd.to_numeric(scores)
#sample size
n = len(scores)
#convert to list
scores = list(scores)
#Walsh Averages
walsh = []
for i in range(0, n):
for j in range(i, n):
walsh.append((scores[i] + scores[j])/2)
#Hodges-Lehmann Estimate
HL = median(walsh)
return HL
Functions
def me_hodges_lehmann_os(scores, levels=None)
-
Hodges-Lehmann Estimate (One-Sample)
The Hodges-Lehmann Estimate (Hodges & Lehmann, 1963) for a one-sample scenario, is the median of the Walsh averages. The Walsh averages (Walsh, 1949a, 1949b) are the average of each possible pair by taking one score and combining it with each of the other scores. Note that each is only counted once, so taking the second and fifth score is the same as taking the fifth and the second, so only one of these is used. It does also include self-pairs, e.g. the third score and third score.
It is in the one-sample case therefor a measure of central tendancy and sometimes referred to as the pseudo median.
The measure is also described at PeterStatistics.com
Parameters
scores
:dataframe
orlist
- the scores
levels
:list
ordictionary
, optional- the scores in order
Returns
HL
:float
- the Hodges-Lehmann Estimate
Notes
The formula used (Hodges & Lehmann, 1963, p. 599):
HL = \text{median}\left(\frac{x_i + x_j}{2} | i \leq i \leq j \leq n\right)
Before, After and Alternatives
Before this measure you might want an impression using a frequency table or a visualisation: * tab_frequency for a frequency table * vi_bar_stacked_single for Single Stacked Bar-Chart * vi_bar_dual_axis for Dual-Axis Bar Chart
After this you might want some other descriptive measures: * me_consensus for the Consensus * me_median for the Median * me_quantiles for Quantiles * me_quartiles for Quartiles / Hinges * me_quartile_range for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range
or perform a test: * ts_sign_os for One-Sample Sign Test * ts_trinomial_os for One-Sample Trinomial Test * ts_wilcoxon_os for Wilcoxon Signed Rank Test (One-Sample)
References
Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. The Annals of Mathematical Statistics, 34(2), 598–611. doi:10.1214/aoms/1177704172
Monahan, J. F. (1984). Algorithm 616: Fast computation of the Hodges-Lehmann location estimator. ACM Transactions on Mathematical Software, 10(3), 265–270. doi:10.1145/1271.319414
Walsh, J. E. (1949a). Applications of some significance tests for the median which are valid under very general conditions. Journal of the American Statistical Association, 44(247), 342–355. doi:10.1080/01621459.1949.10483311
Walsh, J. E. (1949b). Some significance tests for the median which are valid under very general conditions. The Annals of Mathematical Statistics, 20(1), 64–81. doi:10.1214/aoms/1177730091
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Example 1: Text Pandas Series
>>> import pandas as pd >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = student_df['Teach_Motivate'] >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5} >>> me_hodges_lehmann_os(ex1, levels=order) np.float64(2.5)
Example 2: Numeric data
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> me_hodges_lehmann_os(ex2) 3.5
Example 3: Text data with
>>> ex3 = ["a", "b", "f", "d", "e", "c"] >>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6} >>> me_hodges_lehmann_os(ex3, levels=order) np.float64(3.5)
Expand source code
def me_hodges_lehmann_os(scores, levels=None): ''' Hodges-Lehmann Estimate (One-Sample) ---------------------------------------- The Hodges-Lehmann Estimate (Hodges & Lehmann, 1963) for a one-sample scenario, is the median of the Walsh averages. The Walsh averages (Walsh, 1949a, 1949b) are the average of each possible pair by taking one score and combining it with each of the other scores. Note that each is only counted once, so taking the second and fifth score is the same as taking the fifth and the second, so only one of these is used. It does also include self-pairs, e.g. the third score and third score. It is in the one-sample case therefor a measure of central tendancy and sometimes referred to as the pseudo median. The measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/HodgesLehmannOS.html) Parameters ---------- scores : dataframe or list the scores levels : list or dictionary, optional the scores in order Returns ------- HL : float the Hodges-Lehmann Estimate Notes ------ The formula used (Hodges & Lehmann, 1963, p. 599): $$HL = \\text{median}\\left(\\frac{x_i + x_j}{2} | i \\leq i \\leq j \\leq n\\right)$$ Before, After and Alternatives ------------------------------ Before this measure you might want an impression using a frequency table or a visualisation: * [tab_frequency](../other/table_frequency.html#tab_frequency) for a frequency table * [vi_bar_stacked_single](../visualisations/vis_bar_stacked_single.html#vi_bar_stacked_single) for Single Stacked Bar-Chart * [vi_bar_dual_axis](../visualisations/vis_bar_dual_axis.html#vi_bar_dual_axis) for Dual-Axis Bar Chart After this you might want some other descriptive measures: * [me_consensus](../measures/meas_consensus.html#me_consensus) for the Consensus * [me_median](../measures/meas_median.html#me_median) for the Median * [me_quantiles](../measures/meas_quantiles.html#me_quantiles) for Quantiles * [me_quartiles](../measures/meas_quartiles.html#me_quantiles) for Quartiles / Hinges * [me_quartile_range](../measures/meas_quartile_range.html#me_quartile_range) for Interquartile Range, Semi-Interquartile Range and Mid-Quartile Range or perform a test: * [ts_sign_os](../tests/test_sign_os.html#ts_sign_os) for One-Sample Sign Test * [ts_trinomial_os](../tests/test_trinomial_os.html#ts_trinomial_os) for One-Sample Trinomial Test * [ts_wilcoxon_os](../tests/test_wilcoxon_os.html#ts_wilcoxon_os) for Wilcoxon Signed Rank Test (One-Sample) References ---------- Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. *The Annals of Mathematical Statistics, 34*(2), 598–611. doi:10.1214/aoms/1177704172 Monahan, J. F. (1984). Algorithm 616: Fast computation of the Hodges-Lehmann location estimator. *ACM Transactions on Mathematical Software, 10*(3), 265–270. doi:10.1145/1271.319414 Walsh, J. E. (1949a). Applications of some significance tests for the median which are valid under very general conditions. Journal of the American Statistical Association, 44(247), 342–355. doi:10.1080/01621459.1949.10483311 Walsh, J. E. (1949b). Some significance tests for the median which are valid under very general conditions. *The Annals of Mathematical Statistics, 20*(1), 64–81. doi:10.1214/aoms/1177730091 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Example 1: Text Pandas Series >>> import pandas as pd >>> student_df = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = student_df['Teach_Motivate'] >>> order = {"Fully Disagree":1, "Disagree":2, "Neither disagree nor agree":3, "Agree":4, "Fully agree":5} >>> me_hodges_lehmann_os(ex1, levels=order) np.float64(2.5) Example 2: Numeric data >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> me_hodges_lehmann_os(ex2) 3.5 Example 3: Text data with >>> ex3 = ["a", "b", "f", "d", "e", "c"] >>> order = {"a":1, "b":2, "c":3, "d":4, "e":5, "f":6} >>> me_hodges_lehmann_os(ex3, levels=order) np.float64(3.5) ''' if type(scores) is list: scores = pd.Series(scores) #remove missing values scores = scores.dropna() #apply levels if levels is not None: scores = scores.map(levels).astype('Int8') else: scores = pd.to_numeric(scores) #sample size n = len(scores) #convert to list scores = list(scores) #Walsh Averages walsh = [] for i in range(0, n): for j in range(i, n): walsh.append((scores[i] + scores[j])/2) #Hodges-Lehmann Estimate HL = median(walsh) return HL