Module stikpetP.visualisations.vis_bar_clustered
Expand source code
import matplotlib.pyplot as plt
from ..other.table_cross import tab_cross
def vi_bar_clustered(field1, field2, order1=None, order2=None, percent=None):
'''
Clustered / Multiple Bar Chart
------------------------------
A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20).
The bars can be split into multiple bars based on another variable. This is then known as a multiple bar-chart (Kemp, 2004, p. 150) or clustered bar-chart (Brase, 2009, p. 50; Griffith, 2007, p. 168).
It can be defined as “a bar chart for comparing the frequencies of a categorical variable in two or more situations” (Upton & Cook, 2014, p. 283).
The first field will be placed on the horizontal axis, and the second used for the clusters.
Parameters
----------
field1 : pandas series
data with categories for the rows
field2 : pandas series
data with categories for the columns
order1 : list or dictionary, optional
order for categories of field1
order2 : list or dictionary, optional
order for categories of field2
percent : {None, "all", "row", "column"}, optional
which percentages to show. Default is None (will show counts)
Returns
-------
clustered bar chart
References
----------
Brase, C. (2009). *Understandable statistics* (9th ed.). Houghton MIfflin.
Griffith, A. (2007). *SPSS for dummies*. Wiley.
Kemp, S. M., & Kemp, S. (2004). *Business statistics demystified*. McGraw-Hill.
Upton, G., & Cook, I. (2014). *Oxford: Dictionary of statistics* (3rd ed.). Oxford University Press.
Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Example 1: Clustered bar chart with percentages
>>> import pandas as pd
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> vi_bar_clustered(df1['mar1'], df1['sex'], percent="column")
Example 2: Specify order
>>> order1 = ["DIVORCED", "WIDOWED", "SEPARATED", "MARRIED", "NEVER MARRIED"]
>>> order2 = ["MALE", "FEMALE"]
>>> vi_bar_clustered(df1['mar1'], df1['sex'], order1 = order1, order2=["MALE", "FEMALE"])
'''
table = tab_cross(field1, field2, order1=order1, order2=order2, percent=percent, totals="exclude")
table.plot(kind='bar')
if percent is None:
plt.ylabel('Frequency')
else:
if percent=="all":
plt.ylabel('Overall percent')
elif percent=="row":
plt.ylabel('Percent of category')
elif percent=="column":
plt.ylabel("Percent of clusters")
plt.show()
return
Functions
def vi_bar_clustered(field1, field2, order1=None, order2=None, percent=None)
-
Clustered / Multiple Bar Chart
A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20).
The bars can be split into multiple bars based on another variable. This is then known as a multiple bar-chart (Kemp, 2004, p. 150) or clustered bar-chart (Brase, 2009, p. 50; Griffith, 2007, p. 168).
It can be defined as “a bar chart for comparing the frequencies of a categorical variable in two or more situations” (Upton & Cook, 2014, p. 283).
The first field will be placed on the horizontal axis, and the second used for the clusters.
Parameters
field1
:pandas series
- data with categories for the rows
field2
:pandas series
- data with categories for the columns
order1
:list
ordictionary
, optional- order for categories of field1
order2
:list
ordictionary
, optional- order for categories of field2
percent
:{None, "all", "row", "column"}
, optional- which percentages to show. Default is None (will show counts)
Returns
clustered bar chart
References
Brase, C. (2009). Understandable statistics (9th ed.). Houghton MIfflin.
Griffith, A. (2007). SPSS for dummies. Wiley.
Kemp, S. M., & Kemp, S. (2004). Business statistics demystified. McGraw-Hill.
Upton, G., & Cook, I. (2014). Oxford: Dictionary of statistics (3rd ed.). Oxford University Press.
Zedeck, S. (Ed.). (2014). APA dictionary of statistics and research methods. American Psychological Association.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Example 1: Clustered bar chart with percentages
>>> import pandas as pd >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> vi_bar_clustered(df1['mar1'], df1['sex'], percent="column")
Example 2: Specify order
>>> order1 = ["DIVORCED", "WIDOWED", "SEPARATED", "MARRIED", "NEVER MARRIED"] >>> order2 = ["MALE", "FEMALE"] >>> vi_bar_clustered(df1['mar1'], df1['sex'], order1 = order1, order2=["MALE", "FEMALE"])
Expand source code
def vi_bar_clustered(field1, field2, order1=None, order2=None, percent=None): ''' Clustered / Multiple Bar Chart ------------------------------ A bar-chart is defined as “a graph in which bars of varying height with spaces between them are used to display data for variables defined by qualities or categories” (Zedeck, 2014, p. 20). The bars can be split into multiple bars based on another variable. This is then known as a multiple bar-chart (Kemp, 2004, p. 150) or clustered bar-chart (Brase, 2009, p. 50; Griffith, 2007, p. 168). It can be defined as “a bar chart for comparing the frequencies of a categorical variable in two or more situations” (Upton & Cook, 2014, p. 283). The first field will be placed on the horizontal axis, and the second used for the clusters. Parameters ---------- field1 : pandas series data with categories for the rows field2 : pandas series data with categories for the columns order1 : list or dictionary, optional order for categories of field1 order2 : list or dictionary, optional order for categories of field2 percent : {None, "all", "row", "column"}, optional which percentages to show. Default is None (will show counts) Returns ------- clustered bar chart References ---------- Brase, C. (2009). *Understandable statistics* (9th ed.). Houghton MIfflin. Griffith, A. (2007). *SPSS for dummies*. Wiley. Kemp, S. M., & Kemp, S. (2004). *Business statistics demystified*. McGraw-Hill. Upton, G., & Cook, I. (2014). *Oxford: Dictionary of statistics* (3rd ed.). Oxford University Press. Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Example 1: Clustered bar chart with percentages >>> import pandas as pd >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> vi_bar_clustered(df1['mar1'], df1['sex'], percent="column") Example 2: Specify order >>> order1 = ["DIVORCED", "WIDOWED", "SEPARATED", "MARRIED", "NEVER MARRIED"] >>> order2 = ["MALE", "FEMALE"] >>> vi_bar_clustered(df1['mar1'], df1['sex'], order1 = order1, order2=["MALE", "FEMALE"]) ''' table = tab_cross(field1, field2, order1=order1, order2=order2, percent=percent, totals="exclude") table.plot(kind='bar') if percent is None: plt.ylabel('Frequency') else: if percent=="all": plt.ylabel('Overall percent') elif percent=="row": plt.ylabel('Percent of category') elif percent=="column": plt.ylabel("Percent of clusters") plt.show() return