Module `stikpetP.visualisations.vis_bar_clustered`

Expand source code

import matplotlib.pyplot as plt
from ..other.table_cross import tab_cross

def vi_bar_clustered(field1, field2, order1=None, order2=None, percent=None):
    '''
    Clustered / Multiple Bar Chart
    ------------------------------
    
    A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20).
    
    The bars can be split into multiple bars based on another variable. This is then known as a multiple bar-chart (Kemp, 2004, p. 150) or clustered bar-chart (Brase, 2009, p. 50; Griffith, 2007, p. 168).
    
    It can be defined as “a bar chart for comparing the frequencies of a categorical variable in two or more situations” (Upton & Cook, 2014, p. 283). 
    
    The first field will be placed on the horizontal axis, and the second used for the clusters. 
    
    Parameters
    ----------
    field1 : pandas series
        data with categories for the rows
    field2 : pandas series
        data with categories for the columns
    order1 : list or dictionary, optional
        order for categories of field1
    order2 : list or dictionary, optional
        order for categories of field2
    percent : {None, "all", "row", "column"}, optional
        which percentages to show. Default is None (will show counts)
        
    Returns
    -------
    clustered bar chart
    
    References
    ----------
    Brase, C. (2009). *Understandable statistics* (9th ed.). Houghton MIfflin.
    
    Griffith, A. (2007). *SPSS for dummies*. Wiley.
    
    Kemp, S. M., & Kemp, S. (2004). *Business statistics demystified*. McGraw-Hill.
    
    Upton, G., & Cook, I. (2014). *Oxford: Dictionary of statistics* (3rd ed.). Oxford University Press.
    
    Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association.
    
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Clustered bar chart with percentages
    >>> import pandas as pd
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> vi_bar_clustered(df1['mar1'], df1['sex'], percent="column")
    
    Example 2: Specify order
    >>> order1 = ["DIVORCED", "WIDOWED", "SEPARATED", "MARRIED", "NEVER MARRIED"]
    >>> order2 = ["MALE", "FEMALE"]
    >>> vi_bar_clustered(df1['mar1'], df1['sex'], order1 = order1, order2=["MALE", "FEMALE"])
        
    
    '''
    
    table  = tab_cross(field1, field2, order1=order1, order2=order2, percent=percent, totals="exclude")
    table.plot(kind='bar')
    if percent is None:
        plt.ylabel('Frequency')
    else:
        if percent=="all":
            plt.ylabel('Overall percent')
        elif percent=="row":
            plt.ylabel('Percent of category')
        elif percent=="column":
            plt.ylabel("Percent of clusters")
    plt.show()
    
    return

Functions

def vi_bar_clustered(field1, field2, order1=None, order2=None, percent=None)

Clustered / Multiple Bar Chart

A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20).

The bars can be split into multiple bars based on another variable. This is then known as a multiple bar-chart (Kemp, 2004, p. 150) or clustered bar-chart (Brase, 2009, p. 50; Griffith, 2007, p. 168).

It can be defined as “a bar chart for comparing the frequencies of a categorical variable in two or more situations” (Upton & Cook, 2014, p. 283).

The first field will be placed on the horizontal axis, and the second used for the clusters.

Parameters

field1 : pandas series: data with categories for the rows
field2 : pandas series: data with categories for the columns
order1 : list or dictionary, optional: order for categories of field1
order2 : list or dictionary, optional: order for categories of field2
percent : {None, "all", "row", "column"}, optional: which percentages to show. Default is None (will show counts)

Returns

clustered bar chart

References

Brase, C. (2009). Understandable statistics (9th ed.). Houghton MIfflin.

Griffith, A. (2007). SPSS for dummies. Wiley.

Kemp, S. M., & Kemp, S. (2004). Business statistics demystified. McGraw-Hill.

Upton, G., & Cook, I. (2014). Oxford: Dictionary of statistics (3rd ed.). Oxford University Press.

Zedeck, S. (Ed.). (2014). APA dictionary of statistics and research methods. American Psychological Association.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: Clustered bar chart with percentages

>>> import pandas as pd
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> vi_bar_clustered(df1['mar1'], df1['sex'], percent="column")

Example 2: Specify order

>>> order1 = ["DIVORCED", "WIDOWED", "SEPARATED", "MARRIED", "NEVER MARRIED"]
>>> order2 = ["MALE", "FEMALE"]
>>> vi_bar_clustered(df1['mar1'], df1['sex'], order1 = order1, order2=["MALE", "FEMALE"])

Expand source code

def vi_bar_clustered(field1, field2, order1=None, order2=None, percent=None):
    '''
    Clustered / Multiple Bar Chart
    ------------------------------
    
    A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20).
    
    The bars can be split into multiple bars based on another variable. This is then known as a multiple bar-chart (Kemp, 2004, p. 150) or clustered bar-chart (Brase, 2009, p. 50; Griffith, 2007, p. 168).
    
    It can be defined as “a bar chart for comparing the frequencies of a categorical variable in two or more situations” (Upton & Cook, 2014, p. 283). 
    
    The first field will be placed on the horizontal axis, and the second used for the clusters. 
    
    Parameters
    ----------
    field1 : pandas series
        data with categories for the rows
    field2 : pandas series
        data with categories for the columns
    order1 : list or dictionary, optional
        order for categories of field1
    order2 : list or dictionary, optional
        order for categories of field2
    percent : {None, "all", "row", "column"}, optional
        which percentages to show. Default is None (will show counts)
        
    Returns
    -------
    clustered bar chart
    
    References
    ----------
    Brase, C. (2009). *Understandable statistics* (9th ed.). Houghton MIfflin.
    
    Griffith, A. (2007). *SPSS for dummies*. Wiley.
    
    Kemp, S. M., & Kemp, S. (2004). *Business statistics demystified*. McGraw-Hill.
    
    Upton, G., & Cook, I. (2014). *Oxford: Dictionary of statistics* (3rd ed.). Oxford University Press.
    
    Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association.
    
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: Clustered bar chart with percentages
    >>> import pandas as pd
    >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
    >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> vi_bar_clustered(df1['mar1'], df1['sex'], percent="column")
    
    Example 2: Specify order
    >>> order1 = ["DIVORCED", "WIDOWED", "SEPARATED", "MARRIED", "NEVER MARRIED"]
    >>> order2 = ["MALE", "FEMALE"]
    >>> vi_bar_clustered(df1['mar1'], df1['sex'], order1 = order1, order2=["MALE", "FEMALE"])
        
    
    '''
    
    table  = tab_cross(field1, field2, order1=order1, order2=order2, percent=percent, totals="exclude")
    table.plot(kind='bar')
    if percent is None:
        plt.ylabel('Frequency')
    else:
        if percent=="all":
            plt.ylabel('Overall percent')
        elif percent=="row":
            plt.ylabel('Percent of category')
        elif percent=="column":
            plt.ylabel("Percent of clusters")
    plt.show()
    
    return