Module stikpetP.visualisations.vis_spine_plot
Expand source code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from ..other.table_cross import tab_cross
def vi_spine_plot(field1, field2, categories1=None, categories2=None):
'''
Spine Plot / Marimekko Chart / Mosaic Plot
------------------------------------------
A spine plot is similar to a multiple stacked bar-chart, but "the difference is that the bars fill the plot vertically so the shading gives us proportions instead of counts. Also, the width of each bar varies, reflecting the marginal proportion of observations in each workshop" (Muenchen, 2006, p. 286)
It is a chart you could use when with two nominal variables and do not have a clear independent and dependent variable. Otherwise a multiple/clustered bar-chart might be preferred.
Parameters
----------
field1 : pandas series
data with categories for the rows
field2 : pandas series
data with categories for the columns
categories1 : list or dictionary, optional
the categories to use from field1.
categories2 : list or dictionary, optional
the two categories to use from field2.
Returns
-------
spine plot
Notes
-----
The naming of this diagram is unfortunately not very clear. I use the term 'spine plot' as a special case of a Mosaic Plot. Mosaic Plots are often attributed to Hartigan and Kleiner (for example by Friendly (2002, p. 90)). Earlier versions are actually known, for example Walker (1874, p. PI XX). Hartigan and Kleiner (1981) start their paper with a Mosaic Plot for a cross table, but end it with showing Mosaic Plots for multiple dimension cross tables.
A Marimekko Chart is simply an alternative name for the Mosaic Plot, although according to Wikipedia "mosaic plots can be colored and shaded according to deviations from independence, whereas Marimekko charts are colored according to the category levels" (Wikipedia, 2022).
The term 'Spine Plot' itself is often attributed to Hummel, but I've been unable to hunt down his original article: Linked bar charts: Analysing categorical data graphically. Computational Statistics 11: 23–33.
References
----------
Carvalho, T. (2021, April 10). Marimekko Charts with Python’s Matplotlib. Medium. https://towardsdatascience.com/marimekko-charts-with-pythons-matplotlib-6b9784ae73a1
Friendly, M. (2002). A brief history of the mosaic display. *Journal of Computational and Graphical Statistics, 11*(1), 89–107. https://doi.org/10.1198/106186002317375631
Hartigan, J. A., & Kleiner, B. (1981). Mosaics for contingency tables. In W. F. Eddy (Ed.), Proceedings of the 13th Symposium on the Interface (pp. 268–273). Springer. https://doi.org/10.1007/978-1-4613-9464-8_37
Muenchen, R. A. (2009). *R for SAS and SPSS Users*. Springer.
Walker, F. A. (1874). *Statistical atlas of the United States based on the results of the ninth census 1870*. Census Office.
Wikipedia. (2022). Mosaic plot. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Mosaic_plot&oldid=1089465331
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv"
>>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> vi_spine_plot(df1['mar1'], df1['sex'])
'''
ct = tab_cross(field2, field1, order1=categories2, order2=categories1, percent=None, totals="exclude")
x = np.array(ct.sum(axis=1))
x_label = np.array(ct.index)
k1 = len(x_label)
width = x/sum(x)
adjusted_x, temp = [0], 0
for i in width[:-1]:
temp += i
adjusted_x.append(temp)
ct_rowProp = tab_cross(field2, field1, order1=categories2, order2=categories1, percent="row", totals="exclude")/100
legend_labels = list(ct.columns)
k2 = len(legend_labels)
ys = [np.zeros(k1)]
for i in range(0,k2):
ys.append(np.array(ct_rowProp.iloc[:,i]))
y_bottom = np.array(ys).cumsum(axis=0)
fig, ax = plt.subplots(1)
for i in range(0,k2):
plt.bar(adjusted_x, ys[i+1], bottom=y_bottom[i], width=width, align='edge', edgecolor='black')
ax.set_yticks([0, 0.25, 0.5, 0.75, 1])
ax.set_yticklabels(['0%', '25%', '50%', '75%', '100%'])
ax.set_xticks([0, 0.25, 0.5, 0.75, 1])
ax.set_xticklabels(['0%', '25%', '50%', '75%', '100%'])
plt.ylim(0,1)
plt.xlim(0,1)
plt.legend(legend_labels)
axy = ax.twiny()
axy.set_xticks([(width[i]/2)+ v for i, v in enumerate(adjusted_x)])
axy.set_xticklabels(x_label, fontsize=14)
plt.show()
return
Functions
def vi_spine_plot(field1, field2, categories1=None, categories2=None)
-
Spine Plot / Marimekko Chart / Mosaic Plot
A spine plot is similar to a multiple stacked bar-chart, but "the difference is that the bars fill the plot vertically so the shading gives us proportions instead of counts. Also, the width of each bar varies, reflecting the marginal proportion of observations in each workshop" (Muenchen, 2006, p. 286)
It is a chart you could use when with two nominal variables and do not have a clear independent and dependent variable. Otherwise a multiple/clustered bar-chart might be preferred.
Parameters
field1
:pandas series
- data with categories for the rows
field2
:pandas series
- data with categories for the columns
categories1
:list
ordictionary
, optional- the categories to use from field1.
categories2
:list
ordictionary
, optional- the two categories to use from field2.
Returns
spine plot
Notes
The naming of this diagram is unfortunately not very clear. I use the term 'spine plot' as a special case of a Mosaic Plot. Mosaic Plots are often attributed to Hartigan and Kleiner (for example by Friendly (2002, p. 90)). Earlier versions are actually known, for example Walker (1874, p. PI XX). Hartigan and Kleiner (1981) start their paper with a Mosaic Plot for a cross table, but end it with showing Mosaic Plots for multiple dimension cross tables.
A Marimekko Chart is simply an alternative name for the Mosaic Plot, although according to Wikipedia "mosaic plots can be colored and shaded according to deviations from independence, whereas Marimekko charts are colored according to the category levels" (Wikipedia, 2022).
The term 'Spine Plot' itself is often attributed to Hummel, but I've been unable to hunt down his original article: Linked bar charts: Analysing categorical data graphically. Computational Statistics 11: 23–33.
References
Carvalho, T. (2021, April 10). Marimekko Charts with Python’s Matplotlib. Medium. https://towardsdatascience.com/marimekko-charts-with-pythons-matplotlib-6b9784ae73a1
Friendly, M. (2002). A brief history of the mosaic display. Journal of Computational and Graphical Statistics, 11(1), 89–107. https://doi.org/10.1198/106186002317375631
Hartigan, J. A., & Kleiner, B. (1981). Mosaics for contingency tables. In W. F. Eddy (Ed.), Proceedings of the 13th Symposium on the Interface (pp. 268–273). Springer. https://doi.org/10.1007/978-1-4613-9464-8_37
Muenchen, R. A. (2009). R for SAS and SPSS Users. Springer.
Walker, F. A. (1874). Statistical atlas of the United States based on the results of the ninth census 1870. Census Office.
Wikipedia. (2022). Mosaic plot. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Mosaic_plot&oldid=1089465331
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
>>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> vi_spine_plot(df1['mar1'], df1['sex'])
Expand source code
def vi_spine_plot(field1, field2, categories1=None, categories2=None): ''' Spine Plot / Marimekko Chart / Mosaic Plot ------------------------------------------ A spine plot is similar to a multiple stacked bar-chart, but "the difference is that the bars fill the plot vertically so the shading gives us proportions instead of counts. Also, the width of each bar varies, reflecting the marginal proportion of observations in each workshop" (Muenchen, 2006, p. 286) It is a chart you could use when with two nominal variables and do not have a clear independent and dependent variable. Otherwise a multiple/clustered bar-chart might be preferred. Parameters ---------- field1 : pandas series data with categories for the rows field2 : pandas series data with categories for the columns categories1 : list or dictionary, optional the categories to use from field1. categories2 : list or dictionary, optional the two categories to use from field2. Returns ------- spine plot Notes ----- The naming of this diagram is unfortunately not very clear. I use the term 'spine plot' as a special case of a Mosaic Plot. Mosaic Plots are often attributed to Hartigan and Kleiner (for example by Friendly (2002, p. 90)). Earlier versions are actually known, for example Walker (1874, p. PI XX). Hartigan and Kleiner (1981) start their paper with a Mosaic Plot for a cross table, but end it with showing Mosaic Plots for multiple dimension cross tables. A Marimekko Chart is simply an alternative name for the Mosaic Plot, although according to Wikipedia "mosaic plots can be colored and shaded according to deviations from independence, whereas Marimekko charts are colored according to the category levels" (Wikipedia, 2022). The term 'Spine Plot' itself is often attributed to Hummel, but I've been unable to hunt down his original article: Linked bar charts: Analysing categorical data graphically. Computational Statistics 11: 23–33. References ---------- Carvalho, T. (2021, April 10). Marimekko Charts with Python’s Matplotlib. Medium. https://towardsdatascience.com/marimekko-charts-with-pythons-matplotlib-6b9784ae73a1 Friendly, M. (2002). A brief history of the mosaic display. *Journal of Computational and Graphical Statistics, 11*(1), 89–107. https://doi.org/10.1198/106186002317375631 Hartigan, J. A., & Kleiner, B. (1981). Mosaics for contingency tables. In W. F. Eddy (Ed.), Proceedings of the 13th Symposium on the Interface (pp. 268–273). Springer. https://doi.org/10.1007/978-1-4613-9464-8_37 Muenchen, R. A. (2009). *R for SAS and SPSS Users*. Springer. Walker, F. A. (1874). *Statistical atlas of the United States based on the results of the ninth census 1870*. Census Office. Wikipedia. (2022). Mosaic plot. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Mosaic_plot&oldid=1089465331 Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- >>> file1 = "https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv" >>> df1 = pd.read_csv(file1, sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> vi_spine_plot(df1['mar1'], df1['sex']) ''' ct = tab_cross(field2, field1, order1=categories2, order2=categories1, percent=None, totals="exclude") x = np.array(ct.sum(axis=1)) x_label = np.array(ct.index) k1 = len(x_label) width = x/sum(x) adjusted_x, temp = [0], 0 for i in width[:-1]: temp += i adjusted_x.append(temp) ct_rowProp = tab_cross(field2, field1, order1=categories2, order2=categories1, percent="row", totals="exclude")/100 legend_labels = list(ct.columns) k2 = len(legend_labels) ys = [np.zeros(k1)] for i in range(0,k2): ys.append(np.array(ct_rowProp.iloc[:,i])) y_bottom = np.array(ys).cumsum(axis=0) fig, ax = plt.subplots(1) for i in range(0,k2): plt.bar(adjusted_x, ys[i+1], bottom=y_bottom[i], width=width, align='edge', edgecolor='black') ax.set_yticks([0, 0.25, 0.5, 0.75, 1]) ax.set_yticklabels(['0%', '25%', '50%', '75%', '100%']) ax.set_xticks([0, 0.25, 0.5, 0.75, 1]) ax.set_xticklabels(['0%', '25%', '50%', '75%', '100%']) plt.ylim(0,1) plt.xlim(0,1) plt.legend(legend_labels) axy = ax.twiny() axy.set_xticks([(width[i]/2)+ v for i, v in enumerate(adjusted_x)]) axy.set_xticklabels(x_label, fontsize=14) plt.show() return