Module stikpetP.measures.meas_mode
Expand source code
import pandas as pd
import numpy as np
def me_mode(data, allEq="none"):
'''
Mode
----
The mode is a measure of central tendency and defined as “the abscissa corresponding to the ordinate of maximum frequency” (Pearson, 1895, p. 345). A more modern definition would be “the most common value obtained in a set of observations” (Weisstein, 2002).
The word mode might even come from the French word 'mode' which means fashion. Fashion is what most people wear, so the mode is the option most people chose.
If one category has the highest frequency this category will be the modal category and if two or more categories have the same highest frequency each of them will be the mode. If there is only one mode the set is sometimes called unimodal, if there are two it is called bimodal, with three trimodal, etc. For two or more, thse term multimodal can also be used.
An advantage of the mode over many other measures of central tendency (like the median and mean), is that it can be determined for already nominal data types.
A video on the mode is available [here](https://youtu.be/oPpTE8qt2go).
This function is shown in this [YouTube video](https://youtu.be/3MEz6OJI4rE) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/Mode.html)
Parameters
----------
data : list or pandas series
the scores to determine the mode from
allEq : {"none", "all"}, optional
indicator on what to do if maximum frequency is equal for more than one category. Default is "none".
Returns
-------
A dataframe with:
* *mode*, the mode(s)
* *mode freq.*, frequency of the mode
Notes
-----
One small controversy exists if all categories have the same frequency. In this case none of them has a higher occurence than the others, so none of them would be the mode (see for example Spiegel & Stephens, 2008, p. 64, Larson & Farber, 2014, p. 69). This is used when *allEq="none"* and the default.
On a rare occasion someone might argue that if all categories have the same frequency, then all categories are part of the mode since they all have the highest frequency. This is used when *allEq="all"*.
Before, After and Alternatives
------------------------------
Before this an impression using a frequency table or a visualisation might be helpful:
* [tab_frequency](../other/table_frequency.html#tab_frequency)
* [vi_bar_simple](../visualisations/vis_bar_simple.html#vi_bar_simple) for Simple Bar Chart
* [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot
* [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot
* [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart
* [vi_pie](../visualisations/vis_pie.html#vi_pie) for Pie Chart
After this you might want some variation measure:
* [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation
or perform a test:
* [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
* [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit
* [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
* [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
* [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
* [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
* [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
* [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
If you are looking to determine the mode of binned data use:
* [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for the mode with binned data
References
----------
Larson, R., & Farber, E. (2014). *Elementary statistics: Picturing the world* (6th ed.). Pearson.
Pearson, K. (1895). Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. *Philosophical Transactions of the Royal Society of London. (A.), 186*, 343–414. https://doi.org/10.1098/rsta.1895.0010
Spiegel, M. R., & Stephens, L. J. (2008). *Schaum’s outline of theory and problems of statistics* (4th ed.). McGraw-Hill.
Weisstein, E. W. (2002). *CRC concise encyclopedia of mathematics* (2nd ed.). Chapman & Hall/CRC.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Example 1: pandas series
>>> import pandas as pd
>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['mar1']
>>> me_mode(ex1)
mode mode freq.
0 [MARRIED] 972
Example 2: a list
>>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
>>> me_mode(ex2)
mode mode freq.
0 [DIVORCED] 7
Example 3: Multi-Mode
>>> ex3a = [1, 1, 2, 3, 3, 4, 5, 6, 6]
>>> me_mode(ex3a)
mode mode freq.
0 [1.0, 3.0, 6.0] 2
>>> ex3b = ["MARRIED", "DIVORCED", "MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED"]
>>> me_mode(ex3b)
mode mode freq.
0 [MARRIED, DIVORCED] 3
Example 4: All Equal
>>> ex4a = [1, 1, 2, 2, 3, 3, 6, 6]
>>> me_mode(ex4a)
mode mode freq.
0 [no mode] na
>>> ex4b = [1, 1, 2, 2, 3, 3, 6, 6]
>>> me_mode(ex4b, allEq="all")
mode mode freq.
0 [1.0, 2.0, 3.0, 6.0] 2
'''
if type(data) == list:
data = pd.Series(data)
freq = data.value_counts()
maxCount = freq.max()
modes = []
for i in range(len(freq)):
if freq.values[i]==maxCount:
modes = np.append(modes, freq.index[i])
if len(modes)==len(freq) and allEq=="none":
modes = ['no mode']
maxCount = "na"
res = pd.DataFrame(list([[modes, maxCount]]), columns = ["mode", "mode freq."])
return res
Functions
def me_mode(data, allEq='none')
-
Mode
The mode is a measure of central tendency and defined as “the abscissa corresponding to the ordinate of maximum frequency” (Pearson, 1895, p. 345). A more modern definition would be “the most common value obtained in a set of observations” (Weisstein, 2002).
The word mode might even come from the French word 'mode' which means fashion. Fashion is what most people wear, so the mode is the option most people chose.
If one category has the highest frequency this category will be the modal category and if two or more categories have the same highest frequency each of them will be the mode. If there is only one mode the set is sometimes called unimodal, if there are two it is called bimodal, with three trimodal, etc. For two or more, thse term multimodal can also be used.
An advantage of the mode over many other measures of central tendency (like the median and mean), is that it can be determined for already nominal data types.
A video on the mode is available here.
This function is shown in this YouTube video and the measure is also described at PeterStatistics.com
Parameters
data
:list
orpandas series
- the scores to determine the mode from
allEq
:{"none", "all"}
, optional- indicator on what to do if maximum frequency is equal for more than one category. Default is "none".
Returns
A dataframe with:
- mode, the mode(s)
- mode freq., frequency of the mode
Notes
One small controversy exists if all categories have the same frequency. In this case none of them has a higher occurence than the others, so none of them would be the mode (see for example Spiegel & Stephens, 2008, p. 64, Larson & Farber, 2014, p. 69). This is used when allEq="none" and the default.
On a rare occasion someone might argue that if all categories have the same frequency, then all categories are part of the mode since they all have the highest frequency. This is used when allEq="all".
Before, After and Alternatives
Before this an impression using a frequency table or a visualisation might be helpful: * tab_frequency * vi_bar_simple for Simple Bar Chart * vi_cleveland_dot_plot for Cleveland Dot Plot * vi_dot_plot for Dot Plot * vi_pareto_chart for Pareto Chart * vi_pie for Pie Chart
After this you might want some variation measure: * me_qv for Measures of Qualitative Variation
or perform a test: * ts_pearson_gof for Pearson Chi-Square Goodness-of-Fit Test * ts_freeman_tukey_gof for Freeman-Tukey Test of Goodness-of-Fit * ts_freeman_tukey_read for Freeman-Tukey-Read Test of Goodness-of-Fit * ts_g_gof for G (Likelihood Ratio) Goodness-of-Fit Test * ts_mod_log_likelihood_gof for Mod-Log Likelihood Test of Goodness-of-Fit * ts_multinomial_gof for Multinomial Goodness-of-Fit Test * ts_neyman_gof for Neyman Test of Goodness-of-Fit * ts_powerdivergence_gof for Power Divergence GoF Test
If you are looking to determine the mode of binned data use: * me_mode_bin for the mode with binned data
References
Larson, R., & Farber, E. (2014). Elementary statistics: Picturing the world (6th ed.). Pearson.
Pearson, K. (1895). Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philosophical Transactions of the Royal Society of London. (A.), 186, 343–414. https://doi.org/10.1098/rsta.1895.0010
Spiegel, M. R., & Stephens, L. J. (2008). Schaum’s outline of theory and problems of statistics (4th ed.). McGraw-Hill.
Weisstein, E. W. (2002). CRC concise encyclopedia of mathematics (2nd ed.). Chapman & Hall/CRC.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Example 1: pandas series
>>> import pandas as pd >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['mar1'] >>> me_mode(ex1) mode mode freq. 0 [MARRIED] 972
Example 2: a list
>>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"] >>> me_mode(ex2) mode mode freq. 0 [DIVORCED] 7
Example 3: Multi-Mode
>>> ex3a = [1, 1, 2, 3, 3, 4, 5, 6, 6] >>> me_mode(ex3a) mode mode freq. 0 [1.0, 3.0, 6.0] 2 >>> ex3b = ["MARRIED", "DIVORCED", "MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED"] >>> me_mode(ex3b) mode mode freq. 0 [MARRIED, DIVORCED] 3
Example 4: All Equal
>>> ex4a = [1, 1, 2, 2, 3, 3, 6, 6] >>> me_mode(ex4a) mode mode freq. 0 [no mode] na >>> ex4b = [1, 1, 2, 2, 3, 3, 6, 6] >>> me_mode(ex4b, allEq="all") mode mode freq. 0 [1.0, 2.0, 3.0, 6.0] 2
Expand source code
def me_mode(data, allEq="none"): ''' Mode ---- The mode is a measure of central tendency and defined as “the abscissa corresponding to the ordinate of maximum frequency” (Pearson, 1895, p. 345). A more modern definition would be “the most common value obtained in a set of observations” (Weisstein, 2002). The word mode might even come from the French word 'mode' which means fashion. Fashion is what most people wear, so the mode is the option most people chose. If one category has the highest frequency this category will be the modal category and if two or more categories have the same highest frequency each of them will be the mode. If there is only one mode the set is sometimes called unimodal, if there are two it is called bimodal, with three trimodal, etc. For two or more, thse term multimodal can also be used. An advantage of the mode over many other measures of central tendency (like the median and mean), is that it can be determined for already nominal data types. A video on the mode is available [here](https://youtu.be/oPpTE8qt2go). This function is shown in this [YouTube video](https://youtu.be/3MEz6OJI4rE) and the measure is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Measures/Mode.html) Parameters ---------- data : list or pandas series the scores to determine the mode from allEq : {"none", "all"}, optional indicator on what to do if maximum frequency is equal for more than one category. Default is "none". Returns ------- A dataframe with: * *mode*, the mode(s) * *mode freq.*, frequency of the mode Notes ----- One small controversy exists if all categories have the same frequency. In this case none of them has a higher occurence than the others, so none of them would be the mode (see for example Spiegel & Stephens, 2008, p. 64, Larson & Farber, 2014, p. 69). This is used when *allEq="none"* and the default. On a rare occasion someone might argue that if all categories have the same frequency, then all categories are part of the mode since they all have the highest frequency. This is used when *allEq="all"*. Before, After and Alternatives ------------------------------ Before this an impression using a frequency table or a visualisation might be helpful: * [tab_frequency](../other/table_frequency.html#tab_frequency) * [vi_bar_simple](../visualisations/vis_bar_simple.html#vi_bar_simple) for Simple Bar Chart * [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot * [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot * [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart * [vi_pie](../visualisations/vis_pie.html#vi_pie) for Pie Chart After this you might want some variation measure: * [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation or perform a test: * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test * [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test If you are looking to determine the mode of binned data use: * [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for the mode with binned data References ---------- Larson, R., & Farber, E. (2014). *Elementary statistics: Picturing the world* (6th ed.). Pearson. Pearson, K. (1895). Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. *Philosophical Transactions of the Royal Society of London. (A.), 186*, 343–414. https://doi.org/10.1098/rsta.1895.0010 Spiegel, M. R., & Stephens, L. J. (2008). *Schaum’s outline of theory and problems of statistics* (4th ed.). McGraw-Hill. Weisstein, E. W. (2002). *CRC concise encyclopedia of mathematics* (2nd ed.). Chapman & Hall/CRC. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Example 1: pandas series >>> import pandas as pd >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['mar1'] >>> me_mode(ex1) mode mode freq. 0 [MARRIED] 972 Example 2: a list >>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"] >>> me_mode(ex2) mode mode freq. 0 [DIVORCED] 7 Example 3: Multi-Mode >>> ex3a = [1, 1, 2, 3, 3, 4, 5, 6, 6] >>> me_mode(ex3a) mode mode freq. 0 [1.0, 3.0, 6.0] 2 >>> ex3b = ["MARRIED", "DIVORCED", "MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED"] >>> me_mode(ex3b) mode mode freq. 0 [MARRIED, DIVORCED] 3 Example 4: All Equal >>> ex4a = [1, 1, 2, 2, 3, 3, 6, 6] >>> me_mode(ex4a) mode mode freq. 0 [no mode] na >>> ex4b = [1, 1, 2, 2, 3, 3, 6, 6] >>> me_mode(ex4b, allEq="all") mode mode freq. 0 [1.0, 2.0, 3.0, 6.0] 2 ''' if type(data) == list: data = pd.Series(data) freq = data.value_counts() maxCount = freq.max() modes = [] for i in range(len(freq)): if freq.values[i]==maxCount: modes = np.append(modes, freq.index[i]) if len(modes)==len(freq) and allEq=="none": modes = ['no mode'] maxCount = "na" res = pd.DataFrame(list([[modes, maxCount]]), columns = ["mode", "mode freq."]) return res