Module stikpetP.visualisations.vis_bar_simple

Expand source code
import matplotlib.pyplot as plt
import pandas as pd

def vi_bar_simple(data, varname=None, height="count"):
    '''
    Simple Bar Chart
    ----------------
    
    A bar-chart is defined as “a graph in which bars of varying height with spaces between them are  used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20). 
    
    A [YouTube](https://youtu.be/zT52FTyC6P8) video on bar charts.

    This function is shown in this [YouTube video](https://youtu.be/-DnZbLV2dr4) and the visualisation is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/bar-chart.html)
    
    Parameters
    ----------
    data : list or pandas data series 
        the data
    varname : string, optional 
        name for the variable
    height : {"count", "percent"}, optional 
        indicate what the height should represent. Default is "count"
    
    Notes
    -----
    The function uses the *pyplot* library *plot* function.
    
    As a guideline for the size of the bar there is a rule of thumb known as the 'three quarter high rule' (Pitts, 1971). It means that the height of the vertical axis should be 3/4 of the length of the horizontal axis. So if the horizontal axis is 20 cm long, the vertical axis should be 3/4 * 20 = 15 cm high.
    
    According to Singh (2009) vertical bars (instead of horizontal bars) are preferred since they are easier on the eye. However if you have long category names some names might become unreadable. A bar chart with the bars placed horizontally might then be preferred. 
    
    One of the earliest found bar-charts from William Playfair (1786) has the bars placed horizontally. There is an earlier bar chart by Oresme (1486), but that is used more for a theoretical concept, than for descriptive statistics.

    Before, After and Alternatives
    ------------------------------
    Before the visualisation you might first want to get an impression using a frequency table:
    * [tab_frequency](../other/table_frequency.html#tab_frequency)

    After visualisation you might want some descriptive measures:
    * [me_mode](../measures/meas_mode.html#me_mode) for the mode
    * [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation

    or perform a test:
    * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
    * [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit
    * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
    * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
    * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
    * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
    * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
    * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
    
    Alternatives for this visualisation could be:
    * [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot
    * [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot
    * [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart
    * [vi_pie](../visualisations/vis_pie.html#vi_pie) for Pie Chart
    
    References
    ----------
    Oresme, N. (1486). *Tractatus de latitudinibus formarum*. (B. Pelacani da Parma, Ed.). Mathaeus Cerdonis.
    
    Pitts, C. E. (1971). *Introduction to educational psychology: An operant conditioning approach*. Crowell.
    
    Playfair, W. (1786). *The commercial and political atlas*. Debrett; Robinson; and Sewell.
    
    Singh, G. (2009). *Map work and practical geography* (4th ed). Vikas Publishing House Pvt Ltd.
    
    Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: pandas series
    >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['mar1']
    >>> vi_bar_simple(ex1);
    >>> vi_bar_simple(ex1, varname="marital status", height="percent");
    
    Example 2: a list
    >>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
    >>> vi_bar_simple(ex2);
    
    '''
    
    if type(data) == list:
        data = pd.Series(data)
    
    fr = data.value_counts()
    
    if height=="count":
        fr.plot(kind='bar')
        plt.ylabel('Frequency')
    
    elif height=="percent":
        perc = fr/sum(fr) * 100    
        perc.plot(kind='bar')
        plt.ylabel('Percent')
    
    plt.xlabel(varname)
    plt.xticks(rotation=45)
    plt.show
    
    return

Functions

def vi_bar_simple(data, varname=None, height='count')

Simple Bar Chart

A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20).

A YouTube video on bar charts.

This function is shown in this YouTube video and the visualisation is also described at PeterStatistics.com

Parameters

data : list or pandas data series
the data
varname : string, optional
name for the variable
height : {"count", "percent"}, optional
indicate what the height should represent. Default is "count"

Notes

The function uses the pyplot library plot function.

As a guideline for the size of the bar there is a rule of thumb known as the 'three quarter high rule' (Pitts, 1971). It means that the height of the vertical axis should be 3/4 of the length of the horizontal axis. So if the horizontal axis is 20 cm long, the vertical axis should be 3/4 * 20 = 15 cm high.

According to Singh (2009) vertical bars (instead of horizontal bars) are preferred since they are easier on the eye. However if you have long category names some names might become unreadable. A bar chart with the bars placed horizontally might then be preferred.

One of the earliest found bar-charts from William Playfair (1786) has the bars placed horizontally. There is an earlier bar chart by Oresme (1486), but that is used more for a theoretical concept, than for descriptive statistics.

Before, After and Alternatives

Before the visualisation you might first want to get an impression using a frequency table: * tab_frequency

After visualisation you might want some descriptive measures: * me_mode for the mode * me_qv for Measures of Qualitative Variation

or perform a test: * ts_pearson_gof for Pearson Chi-Square Goodness-of-Fit Test * ts_freeman_tukey_gof for Freeman-Tukey Test of Goodness-of-Fit * ts_freeman_tukey_read for Freeman-Tukey-Read Test of Goodness-of-Fit * ts_g_gof for G (Likelihood Ratio) Goodness-of-Fit Test * ts_mod_log_likelihood_gof for Mod-Log Likelihood Test of Goodness-of-Fit * ts_multinomial_gof for Multinomial Goodness-of-Fit Test * ts_neyman_gof for Neyman Test of Goodness-of-Fit * ts_powerdivergence_gof for Power Divergence GoF Test

Alternatives for this visualisation could be: * vi_cleveland_dot_plot for Cleveland Dot Plot * vi_dot_plot for Dot Plot * vi_pareto_chart for Pareto Chart * vi_pie for Pie Chart

References

Oresme, N. (1486). Tractatus de latitudinibus formarum. (B. Pelacani da Parma, Ed.). Mathaeus Cerdonis.

Pitts, C. E. (1971). Introduction to educational psychology: An operant conditioning approach. Crowell.

Playfair, W. (1786). The commercial and political atlas. Debrett; Robinson; and Sewell.

Singh, G. (2009). Map work and practical geography (4th ed). Vikas Publishing House Pvt Ltd.

Zedeck, S. (Ed.). (2014). APA dictionary of statistics and research methods. American Psychological Association.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: pandas series

>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['mar1']
>>> vi_bar_simple(ex1);
>>> vi_bar_simple(ex1, varname="marital status", height="percent");

Example 2: a list

>>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
>>> vi_bar_simple(ex2);
Expand source code
def vi_bar_simple(data, varname=None, height="count"):
    '''
    Simple Bar Chart
    ----------------
    
    A bar-chart is defined as “a graph in which bars of varying height with spaces between them are  used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20). 
    
    A [YouTube](https://youtu.be/zT52FTyC6P8) video on bar charts.

    This function is shown in this [YouTube video](https://youtu.be/-DnZbLV2dr4) and the visualisation is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/bar-chart.html)
    
    Parameters
    ----------
    data : list or pandas data series 
        the data
    varname : string, optional 
        name for the variable
    height : {"count", "percent"}, optional 
        indicate what the height should represent. Default is "count"
    
    Notes
    -----
    The function uses the *pyplot* library *plot* function.
    
    As a guideline for the size of the bar there is a rule of thumb known as the 'three quarter high rule' (Pitts, 1971). It means that the height of the vertical axis should be 3/4 of the length of the horizontal axis. So if the horizontal axis is 20 cm long, the vertical axis should be 3/4 * 20 = 15 cm high.
    
    According to Singh (2009) vertical bars (instead of horizontal bars) are preferred since they are easier on the eye. However if you have long category names some names might become unreadable. A bar chart with the bars placed horizontally might then be preferred. 
    
    One of the earliest found bar-charts from William Playfair (1786) has the bars placed horizontally. There is an earlier bar chart by Oresme (1486), but that is used more for a theoretical concept, than for descriptive statistics.

    Before, After and Alternatives
    ------------------------------
    Before the visualisation you might first want to get an impression using a frequency table:
    * [tab_frequency](../other/table_frequency.html#tab_frequency)

    After visualisation you might want some descriptive measures:
    * [me_mode](../measures/meas_mode.html#me_mode) for the mode
    * [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation

    or perform a test:
    * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
    * [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit
    * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
    * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
    * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
    * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
    * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
    * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
    
    Alternatives for this visualisation could be:
    * [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot
    * [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot
    * [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart
    * [vi_pie](../visualisations/vis_pie.html#vi_pie) for Pie Chart
    
    References
    ----------
    Oresme, N. (1486). *Tractatus de latitudinibus formarum*. (B. Pelacani da Parma, Ed.). Mathaeus Cerdonis.
    
    Pitts, C. E. (1971). *Introduction to educational psychology: An operant conditioning approach*. Crowell.
    
    Playfair, W. (1786). *The commercial and political atlas*. Debrett; Robinson; and Sewell.
    
    Singh, G. (2009). *Map work and practical geography* (4th ed). Vikas Publishing House Pvt Ltd.
    
    Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: pandas series
    >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['mar1']
    >>> vi_bar_simple(ex1);
    >>> vi_bar_simple(ex1, varname="marital status", height="percent");
    
    Example 2: a list
    >>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
    >>> vi_bar_simple(ex2);
    
    '''
    
    if type(data) == list:
        data = pd.Series(data)
    
    fr = data.value_counts()
    
    if height=="count":
        fr.plot(kind='bar')
        plt.ylabel('Frequency')
    
    elif height=="percent":
        perc = fr/sum(fr) * 100    
        perc.plot(kind='bar')
        plt.ylabel('Percent')
    
    plt.xlabel(varname)
    plt.xticks(rotation=45)
    plt.show
    
    return