Module stikpetP.visualisations.vis_bar_simple
Expand source code
import matplotlib.pyplot as plt
import pandas as pd
def vi_bar_simple(data, varname=None, height="count"):
'''
Simple Bar Chart
----------------
A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20).
A [YouTube](https://youtu.be/zT52FTyC6P8) video on bar charts.
This function is shown in this [YouTube video](https://youtu.be/-DnZbLV2dr4) and the visualisation is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/bar-chart.html)
Parameters
----------
data : list or pandas data series
the data
varname : string, optional
name for the variable
height : {"count", "percent"}, optional
indicate what the height should represent. Default is "count"
Notes
-----
The function uses the *pyplot* library *plot* function.
As a guideline for the size of the bar there is a rule of thumb known as the 'three quarter high rule' (Pitts, 1971). It means that the height of the vertical axis should be 3/4 of the length of the horizontal axis. So if the horizontal axis is 20 cm long, the vertical axis should be 3/4 * 20 = 15 cm high.
According to Singh (2009) vertical bars (instead of horizontal bars) are preferred since they are easier on the eye. However if you have long category names some names might become unreadable. A bar chart with the bars placed horizontally might then be preferred.
One of the earliest found bar-charts from William Playfair (1786) has the bars placed horizontally. There is an earlier bar chart by Oresme (1486), but that is used more for a theoretical concept, than for descriptive statistics.
Before, After and Alternatives
------------------------------
Before the visualisation you might first want to get an impression using a frequency table:
* [tab_frequency](../other/table_frequency.html#tab_frequency)
After visualisation you might want some descriptive measures:
* [me_mode](../measures/meas_mode.html#me_mode) for the mode
* [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation
or perform a test:
* [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
* [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit
* [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
* [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
* [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
* [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
* [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
* [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
Alternatives for this visualisation could be:
* [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot
* [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot
* [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart
* [vi_pie](../visualisations/vis_pie.html#vi_pie) for Pie Chart
References
----------
Oresme, N. (1486). *Tractatus de latitudinibus formarum*. (B. Pelacani da Parma, Ed.). Mathaeus Cerdonis.
Pitts, C. E. (1971). *Introduction to educational psychology: An operant conditioning approach*. Crowell.
Playfair, W. (1786). *The commercial and political atlas*. Debrett; Robinson; and Sewell.
Singh, G. (2009). *Map work and practical geography* (4th ed). Vikas Publishing House Pvt Ltd.
Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Example 1: pandas series
>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['mar1']
>>> vi_bar_simple(ex1);
>>> vi_bar_simple(ex1, varname="marital status", height="percent");
Example 2: a list
>>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
>>> vi_bar_simple(ex2);
'''
if type(data) == list:
data = pd.Series(data)
fr = data.value_counts()
if height=="count":
fr.plot(kind='bar')
plt.ylabel('Frequency')
elif height=="percent":
perc = fr/sum(fr) * 100
perc.plot(kind='bar')
plt.ylabel('Percent')
plt.xlabel(varname)
plt.xticks(rotation=45)
plt.show
return
Functions
def vi_bar_simple(data, varname=None, height='count')-
Simple Bar Chart
A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20).
A YouTube video on bar charts.
This function is shown in this YouTube video and the visualisation is also described at PeterStatistics.com
Parameters
data:listorpandas data series- the data
varname:string, optional- name for the variable
height:{"count", "percent"}, optional- indicate what the height should represent. Default is "count"
Notes
The function uses the pyplot library plot function.
As a guideline for the size of the bar there is a rule of thumb known as the 'three quarter high rule' (Pitts, 1971). It means that the height of the vertical axis should be 3/4 of the length of the horizontal axis. So if the horizontal axis is 20 cm long, the vertical axis should be 3/4 * 20 = 15 cm high.
According to Singh (2009) vertical bars (instead of horizontal bars) are preferred since they are easier on the eye. However if you have long category names some names might become unreadable. A bar chart with the bars placed horizontally might then be preferred.
One of the earliest found bar-charts from William Playfair (1786) has the bars placed horizontally. There is an earlier bar chart by Oresme (1486), but that is used more for a theoretical concept, than for descriptive statistics.
Before, After and Alternatives
Before the visualisation you might first want to get an impression using a frequency table: * tab_frequency
After visualisation you might want some descriptive measures: * me_mode for the mode * me_qv for Measures of Qualitative Variation
or perform a test: * ts_pearson_gof for Pearson Chi-Square Goodness-of-Fit Test * ts_freeman_tukey_gof for Freeman-Tukey Test of Goodness-of-Fit * ts_freeman_tukey_read for Freeman-Tukey-Read Test of Goodness-of-Fit * ts_g_gof for G (Likelihood Ratio) Goodness-of-Fit Test * ts_mod_log_likelihood_gof for Mod-Log Likelihood Test of Goodness-of-Fit * ts_multinomial_gof for Multinomial Goodness-of-Fit Test * ts_neyman_gof for Neyman Test of Goodness-of-Fit * ts_powerdivergence_gof for Power Divergence GoF Test
Alternatives for this visualisation could be: * vi_cleveland_dot_plot for Cleveland Dot Plot * vi_dot_plot for Dot Plot * vi_pareto_chart for Pareto Chart * vi_pie for Pie Chart
References
Oresme, N. (1486). Tractatus de latitudinibus formarum. (B. Pelacani da Parma, Ed.). Mathaeus Cerdonis.
Pitts, C. E. (1971). Introduction to educational psychology: An operant conditioning approach. Crowell.
Playfair, W. (1786). The commercial and political atlas. Debrett; Robinson; and Sewell.
Singh, G. (2009). Map work and practical geography (4th ed). Vikas Publishing House Pvt Ltd.
Zedeck, S. (Ed.). (2014). APA dictionary of statistics and research methods. American Psychological Association.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Example 1: pandas series
>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['mar1'] >>> vi_bar_simple(ex1); >>> vi_bar_simple(ex1, varname="marital status", height="percent");Example 2: a list
>>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"] >>> vi_bar_simple(ex2);Expand source code
def vi_bar_simple(data, varname=None, height="count"): ''' Simple Bar Chart ---------------- A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20). A [YouTube](https://youtu.be/zT52FTyC6P8) video on bar charts. This function is shown in this [YouTube video](https://youtu.be/-DnZbLV2dr4) and the visualisation is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/bar-chart.html) Parameters ---------- data : list or pandas data series the data varname : string, optional name for the variable height : {"count", "percent"}, optional indicate what the height should represent. Default is "count" Notes ----- The function uses the *pyplot* library *plot* function. As a guideline for the size of the bar there is a rule of thumb known as the 'three quarter high rule' (Pitts, 1971). It means that the height of the vertical axis should be 3/4 of the length of the horizontal axis. So if the horizontal axis is 20 cm long, the vertical axis should be 3/4 * 20 = 15 cm high. According to Singh (2009) vertical bars (instead of horizontal bars) are preferred since they are easier on the eye. However if you have long category names some names might become unreadable. A bar chart with the bars placed horizontally might then be preferred. One of the earliest found bar-charts from William Playfair (1786) has the bars placed horizontally. There is an earlier bar chart by Oresme (1486), but that is used more for a theoretical concept, than for descriptive statistics. Before, After and Alternatives ------------------------------ Before the visualisation you might first want to get an impression using a frequency table: * [tab_frequency](../other/table_frequency.html#tab_frequency) After visualisation you might want some descriptive measures: * [me_mode](../measures/meas_mode.html#me_mode) for the mode * [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation or perform a test: * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test * [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test Alternatives for this visualisation could be: * [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot * [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot * [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart * [vi_pie](../visualisations/vis_pie.html#vi_pie) for Pie Chart References ---------- Oresme, N. (1486). *Tractatus de latitudinibus formarum*. (B. Pelacani da Parma, Ed.). Mathaeus Cerdonis. Pitts, C. E. (1971). *Introduction to educational psychology: An operant conditioning approach*. Crowell. Playfair, W. (1786). *The commercial and political atlas*. Debrett; Robinson; and Sewell. Singh, G. (2009). *Map work and practical geography* (4th ed). Vikas Publishing House Pvt Ltd. Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Example 1: pandas series >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['mar1'] >>> vi_bar_simple(ex1); >>> vi_bar_simple(ex1, varname="marital status", height="percent"); Example 2: a list >>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"] >>> vi_bar_simple(ex2); ''' if type(data) == list: data = pd.Series(data) fr = data.value_counts() if height=="count": fr.plot(kind='bar') plt.ylabel('Frequency') elif height=="percent": perc = fr/sum(fr) * 100 perc.plot(kind='bar') plt.ylabel('Percent') plt.xlabel(varname) plt.xticks(rotation=45) plt.show return