Module stikpetP.visualisations.vis_butterfly_chart
Expand source code
import matplotlib.pyplot as plt
import pandas as pd
from ..other.table_cross import tab_cross
def vi_butterfly_chart(field1, field2, categories1=None, categories2=None, variation='butterfly'):
'''
Butterfly Chart / Tornado Chart / Pyramid Chart
-----------------------------------------------
A special case of diverging bar charts when only comparing two categories.
Depending on the ordering of the results different names exist. I've chosen to use 'butterfly' if no ordering is done, 'pyramid' if they are ordered from small to large, and 'tornado' when going from large to small.
This function is shown in this [YouTube video](https://youtu.be/f_5dTS5gb-4) and the diagram is also discussed at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/PyramidChart.html)
Parameters
----------
field1 : pandas series
data with categories for the rows
field2 : pandas series
data with categories for the columns
categories1 : list or dictionary, optional
the categories to use from field1.
categories2 : list or dictionary, optional
the two categories to use from field2.
variation : {"butterfly", "tornado", "pyramid"}, optional
order of the bars
Returns
-------
plot
Notes
-----
The term *butterfly chart* can for example be found in Hwang and Yoon (2021, p. 25).
The term *tornado diagrom* can be found in the guide from the Project Management Institute (2013, p. 338). The term *funnel chart* is also sometimes used (for example Jamsa (2020, p. 135)), but this is also a term sometimes used for a more analytical scatterplot used for some specific analysis.
The term *pyramid chart* can for example be found in Schwabish (2021, p. 185). It is very often used for comparing age distributions.
References
----------
Hwang, J., & Yoon, Y. (2021). Data analytics and visualization in quality analysis using Tableau. CRC Press.
Jamsa, K. (2020). Introduction to data mining and analytics: With machine learning in R and Python. Jones & Bartlett Learning.
Project Management Institute (Ed.). (2013). A guide to the project management body of knowledge (5th ed.). Project Management Institute, Inc.
Schwabish, J. (2021). Better data visualizations: A guide for scholars, researchers, and wonks. Columbia University Press.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> vi_butterfly_chart(df1['mar1'], df1['sex'])
>>> vi_butterfly_chart(df1['mar1'], df1['sex'], variation="pyramid")
>>> vi_butterfly_chart(df1['mar1'], df1['sex'], variation="tornado")
'''
ct = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="exclude")
k = len(ct.index)
if variation=='tornado':
ct = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="include")
ct = ct.sort_values(by=['Total'])
ct = ct.iloc[0:k, 0:2]
if variation=='pyramid':
ct = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="include")
ct = ct.sort_values(by=['Total'], ascending=False)
ct = ct.iloc[1:1+k, 0:2]
y = ct.index
scores1 = ct.iloc[:,0]
scores2 = ct.iloc[:,1]
maxCount = max(ct.max(axis=1))
xLim = maxCount + 0.5
fig, axes = plt.subplots(ncols=2, sharey=True, figsize=(9, 6))
axes[0].barh(y, scores1, align='center', color='royalblue')
axes[0].set(title=ct.columns[0])
axes[1].barh(y, scores2, align='center', color='orange')
axes[1].set(title=ct.columns[1])
axes[1].grid()
axes[0].invert_xaxis()
axes[0].grid()
axes[0].set_xlim([xLim,0])
axes[1].set_xlim([0,xLim])
plt.subplots_adjust(wspace=0, hspace=0)
plt.show()
Functions
def vi_butterfly_chart(field1, field2, categories1=None, categories2=None, variation='butterfly')
-
Butterfly Chart / Tornado Chart / Pyramid Chart
A special case of diverging bar charts when only comparing two categories.
Depending on the ordering of the results different names exist. I've chosen to use 'butterfly' if no ordering is done, 'pyramid' if they are ordered from small to large, and 'tornado' when going from large to small.
This function is shown in this YouTube video and the diagram is also discussed at PeterStatistics.com
Parameters
field1
:pandas series
- data with categories for the rows
field2
:pandas series
- data with categories for the columns
categories1
:list
ordictionary
, optional- the categories to use from field1.
categories2
:list
ordictionary
, optional- the two categories to use from field2.
variation
:{"butterfly", "tornado", "pyramid"}
, optional- order of the bars
Returns
plot
Notes
The term butterfly chart can for example be found in Hwang and Yoon (2021, p. 25).
The term tornado diagrom can be found in the guide from the Project Management Institute (2013, p. 338). The term funnel chart is also sometimes used (for example Jamsa (2020, p. 135)), but this is also a term sometimes used for a more analytical scatterplot used for some specific analysis.
The term pyramid chart can for example be found in Schwabish (2021, p. 185). It is very often used for comparing age distributions.
References
Hwang, J., & Yoon, Y. (2021). Data analytics and visualization in quality analysis using Tableau. CRC Press.
Jamsa, K. (2020). Introduction to data mining and analytics: With machine learning in R and Python. Jones & Bartlett Learning.
Project Management Institute (Ed.). (2013). A guide to the project management body of knowledge (5th ed.). Project Management Institute, Inc.
Schwabish, J. (2021). Better data visualizations: A guide for scholars, researchers, and wonks. Columbia University Press.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> vi_butterfly_chart(df1['mar1'], df1['sex'])
>>> vi_butterfly_chart(df1['mar1'], df1['sex'], variation="pyramid")
>>> vi_butterfly_chart(df1['mar1'], df1['sex'], variation="tornado")
Expand source code
def vi_butterfly_chart(field1, field2, categories1=None, categories2=None, variation='butterfly'): ''' Butterfly Chart / Tornado Chart / Pyramid Chart ----------------------------------------------- A special case of diverging bar charts when only comparing two categories. Depending on the ordering of the results different names exist. I've chosen to use 'butterfly' if no ordering is done, 'pyramid' if they are ordered from small to large, and 'tornado' when going from large to small. This function is shown in this [YouTube video](https://youtu.be/f_5dTS5gb-4) and the diagram is also discussed at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/PyramidChart.html) Parameters ---------- field1 : pandas series data with categories for the rows field2 : pandas series data with categories for the columns categories1 : list or dictionary, optional the categories to use from field1. categories2 : list or dictionary, optional the two categories to use from field2. variation : {"butterfly", "tornado", "pyramid"}, optional order of the bars Returns ------- plot Notes ----- The term *butterfly chart* can for example be found in Hwang and Yoon (2021, p. 25). The term *tornado diagrom* can be found in the guide from the Project Management Institute (2013, p. 338). The term *funnel chart* is also sometimes used (for example Jamsa (2020, p. 135)), but this is also a term sometimes used for a more analytical scatterplot used for some specific analysis. The term *pyramid chart* can for example be found in Schwabish (2021, p. 185). It is very often used for comparing age distributions. References ---------- Hwang, J., & Yoon, Y. (2021). Data analytics and visualization in quality analysis using Tableau. CRC Press. Jamsa, K. (2020). Introduction to data mining and analytics: With machine learning in R and Python. Jones & Bartlett Learning. Project Management Institute (Ed.). (2013). A guide to the project management body of knowledge (5th ed.). Project Management Institute, Inc. Schwabish, J. (2021). Better data visualizations: A guide for scholars, researchers, and wonks. Columbia University Press. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> vi_butterfly_chart(df1['mar1'], df1['sex']) >>> vi_butterfly_chart(df1['mar1'], df1['sex'], variation="pyramid") >>> vi_butterfly_chart(df1['mar1'], df1['sex'], variation="tornado") ''' ct = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="exclude") k = len(ct.index) if variation=='tornado': ct = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="include") ct = ct.sort_values(by=['Total']) ct = ct.iloc[0:k, 0:2] if variation=='pyramid': ct = tab_cross(field1, field2, order1=categories1, order2=categories2, percent=None, totals="include") ct = ct.sort_values(by=['Total'], ascending=False) ct = ct.iloc[1:1+k, 0:2] y = ct.index scores1 = ct.iloc[:,0] scores2 = ct.iloc[:,1] maxCount = max(ct.max(axis=1)) xLim = maxCount + 0.5 fig, axes = plt.subplots(ncols=2, sharey=True, figsize=(9, 6)) axes[0].barh(y, scores1, align='center', color='royalblue') axes[0].set(title=ct.columns[0]) axes[1].barh(y, scores2, align='center', color='orange') axes[1].set(title=ct.columns[1]) axes[1].grid() axes[0].invert_xaxis() axes[0].grid() axes[0].set_xlim([xLim,0]) axes[1].set_xlim([0,xLim]) plt.subplots_adjust(wspace=0, hspace=0) plt.show()