Module stikpetP.visualisations.vis_pie
Expand source code
import matplotlib.pyplot as plt
import pandas as pd
def vi_pie(data, labels="count"):
'''
Pie Chart
---------
A pie-chart is a “graphic display in which a circle is cut into wedges with the area of each wedge being proportional to the percentage of cases in the category represented by that wedge” (Zedeck, 2014, p. 260).
A video on pie charts is available [here](https://youtu.be/e6JtJsh-6iw).
This function is shown in this [YouTube video](https://youtu.be/oP4Tyl3u5Vc) and the visualisation is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/PieChart.html)
Parameters
----------
data : list or pandas series
labels : {'count', 'percent', 'both', 'none'}, optional,
what to show besides the labels
Notes
-----
It is possible to either show only the labels (label="none"), the counts (label="count"), the percentages (label="percent"), or both count and percent (label="both").
The function uses the matplotlib pyplot library *plot* function, rotated and counter clockwise.
The pie-chart is quite popular and often used, but actually has a few disadvantages. It can only show relative frequencies. To show other frequencies the numbers themselves have to be added. A circle has 360 degrees, equal to 100%. So by multiplying the relative frequencies with 360, the degrees for each category can be found. This means that visually the pie-chart can only show the relative frequencies.
Another disadvantage is when the relative frequencies are close to each other, the differences are not easily seen in a circle diagram.
As a third disadvantage, when there are many categories the circle diagram will look very busy and not easily to read.
People also have more difficulty with comparing areas and angles (what you do when looking at a pie-chart) than comparing heights (what is done with a bar-chart).
Also often a 3D effect is added, but this actually makes comparisons of the slices even more difficult.
The earliest found circle diagram is found on the inlay of a book by William Playfair (1801).The name 'pie chart' might come from a misspelling of the word Pi. Pi is often associated with a circle. It might also simply come from the resemblances with a pie (as in apple-pie). However Srivastava and Rego (2011) put forward another belief that it is named after a royal French cook Pie, who served dishes in a pie-chart shape.
See Also
--------
Before the visualisation you might first want to get an impression using a frequency table:
* [tab_frequency](../other/table_frequency.html#tab_frequency)
After visualisation you might want some descriptive measures:
* [me_mode](../measures/meas_mode.html#me_mode) for the mode
* [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation
or perform a test:
* [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
* [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit
* [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
* [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
* [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
* [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
* [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
* [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
Alternatives for this visualisation could be:
* [vi_bar_simple](../visualisations/vis_bar_simple.html#vi_bar_simple) for Simple Bar Chart
* [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot
* [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot
* [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart
References
----------
Playfair, W. (1801). *The statistical breviary: Shewing the resources of every state and kingdom*. T. Bensley. http://archive.org/details/statisticalbrev00playgoog
Srivastava, T. N., & Rego, S. (2011). *Business research methodology*. Tata McGraw-Hill.
Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
--------
Example 1: pandas series
>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['mar1']
>>> vi_pie(ex1);
>>> vi_pie(ex1, labels="percent");
>>> vi_pie(ex1, labels="none");
>>> vi_pie(ex1, labels="both");
Example 2: a list
>>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
>>> vi_pie(ex2);
'''
if type(data) == list:
data = pd.Series(data)
freq = data.value_counts()
if labels=="none":
freq.plot(kind='pie', ylabel="", startangle=90)
elif labels=="percent":
freq.plot(kind='pie', ylabel="", startangle=90, autopct='%1.1f%%')
elif labels=="count":
freq.plot(kind='pie', ylabel="", startangle=90, autopct=lambda x: str(round(x*freq.sum()/100)))
elif labels=="both":
freq.plot(kind='pie', ylabel="", startangle=90, autopct=lambda x: str(round(x*freq.sum()/100)) + "; " + str(round(x,1)) + "%")
plt.show
return
Functions
def vi_pie(data, labels='count')-
Pie Chart
A pie-chart is a “graphic display in which a circle is cut into wedges with the area of each wedge being proportional to the percentage of cases in the category represented by that wedge” (Zedeck, 2014, p. 260).
A video on pie charts is available here.
This function is shown in this YouTube video and the visualisation is also described at PeterStatistics.com
Parameters
data:listorpandas serieslabels:{'count', 'percent', 'both', 'none'}, optional,- what to show besides the labels
Notes
It is possible to either show only the labels (label="none"), the counts (label="count"), the percentages (label="percent"), or both count and percent (label="both").
The function uses the matplotlib pyplot library plot function, rotated and counter clockwise.
The pie-chart is quite popular and often used, but actually has a few disadvantages. It can only show relative frequencies. To show other frequencies the numbers themselves have to be added. A circle has 360 degrees, equal to 100%. So by multiplying the relative frequencies with 360, the degrees for each category can be found. This means that visually the pie-chart can only show the relative frequencies.
Another disadvantage is when the relative frequencies are close to each other, the differences are not easily seen in a circle diagram.
As a third disadvantage, when there are many categories the circle diagram will look very busy and not easily to read.
People also have more difficulty with comparing areas and angles (what you do when looking at a pie-chart) than comparing heights (what is done with a bar-chart).
Also often a 3D effect is added, but this actually makes comparisons of the slices even more difficult.
The earliest found circle diagram is found on the inlay of a book by William Playfair (1801).The name 'pie chart' might come from a misspelling of the word Pi. Pi is often associated with a circle. It might also simply come from the resemblances with a pie (as in apple-pie). However Srivastava and Rego (2011) put forward another belief that it is named after a royal French cook Pie, who served dishes in a pie-chart shape.
See Also
Before the visualisation you might first want to get an impression using a frequency table:* [tab_frequency](../other/table_frequency.html#tab_frequency)After visualisation you might want some descriptive measures:* [me_mode](../measures/meas_mode.html#me_mode) for the mode* [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variationor perform a test:* [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test* [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit* [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit* [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test* [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit* [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test* [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit* [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF TestAlternatives for this visualisation could be:* [vi_bar_simple](../visualisations/vis_bar_simple.html#vi_bar_simple) for Simple Bar Chart* [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot* [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot* [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto ChartReferences
Playfair, W. (1801). The statistical breviary: Shewing the resources of every state and kingdom. T. Bensley. http://archive.org/details/statisticalbrev00playgoog
Srivastava, T. N., & Rego, S. (2011). Business research methodology. Tata McGraw-Hill.
Zedeck, S. (Ed.). (2014). APA dictionary of statistics and research methods. American Psychological Association.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Example 1: pandas series
>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['mar1'] >>> vi_pie(ex1); >>> vi_pie(ex1, labels="percent"); >>> vi_pie(ex1, labels="none"); >>> vi_pie(ex1, labels="both");Example 2: a list
>>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"] >>> vi_pie(ex2);Expand source code
def vi_pie(data, labels="count"): ''' Pie Chart --------- A pie-chart is a “graphic display in which a circle is cut into wedges with the area of each wedge being proportional to the percentage of cases in the category represented by that wedge” (Zedeck, 2014, p. 260). A video on pie charts is available [here](https://youtu.be/e6JtJsh-6iw). This function is shown in this [YouTube video](https://youtu.be/oP4Tyl3u5Vc) and the visualisation is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/PieChart.html) Parameters ---------- data : list or pandas series labels : {'count', 'percent', 'both', 'none'}, optional, what to show besides the labels Notes ----- It is possible to either show only the labels (label="none"), the counts (label="count"), the percentages (label="percent"), or both count and percent (label="both"). The function uses the matplotlib pyplot library *plot* function, rotated and counter clockwise. The pie-chart is quite popular and often used, but actually has a few disadvantages. It can only show relative frequencies. To show other frequencies the numbers themselves have to be added. A circle has 360 degrees, equal to 100%. So by multiplying the relative frequencies with 360, the degrees for each category can be found. This means that visually the pie-chart can only show the relative frequencies. Another disadvantage is when the relative frequencies are close to each other, the differences are not easily seen in a circle diagram. As a third disadvantage, when there are many categories the circle diagram will look very busy and not easily to read. People also have more difficulty with comparing areas and angles (what you do when looking at a pie-chart) than comparing heights (what is done with a bar-chart). Also often a 3D effect is added, but this actually makes comparisons of the slices even more difficult. The earliest found circle diagram is found on the inlay of a book by William Playfair (1801).The name 'pie chart' might come from a misspelling of the word Pi. Pi is often associated with a circle. It might also simply come from the resemblances with a pie (as in apple-pie). However Srivastava and Rego (2011) put forward another belief that it is named after a royal French cook Pie, who served dishes in a pie-chart shape. See Also -------- Before the visualisation you might first want to get an impression using a frequency table: * [tab_frequency](../other/table_frequency.html#tab_frequency) After visualisation you might want some descriptive measures: * [me_mode](../measures/meas_mode.html#me_mode) for the mode * [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation or perform a test: * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test * [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test Alternatives for this visualisation could be: * [vi_bar_simple](../visualisations/vis_bar_simple.html#vi_bar_simple) for Simple Bar Chart * [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot * [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot * [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart References ---------- Playfair, W. (1801). *The statistical breviary: Shewing the resources of every state and kingdom*. T. Bensley. http://archive.org/details/statisticalbrev00playgoog Srivastava, T. N., & Rego, S. (2011). *Business research methodology*. Tata McGraw-Hill. Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples -------- Example 1: pandas series >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df1['mar1'] >>> vi_pie(ex1); >>> vi_pie(ex1, labels="percent"); >>> vi_pie(ex1, labels="none"); >>> vi_pie(ex1, labels="both"); Example 2: a list >>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"] >>> vi_pie(ex2); ''' if type(data) == list: data = pd.Series(data) freq = data.value_counts() if labels=="none": freq.plot(kind='pie', ylabel="", startangle=90) elif labels=="percent": freq.plot(kind='pie', ylabel="", startangle=90, autopct='%1.1f%%') elif labels=="count": freq.plot(kind='pie', ylabel="", startangle=90, autopct=lambda x: str(round(x*freq.sum()/100))) elif labels=="both": freq.plot(kind='pie', ylabel="", startangle=90, autopct=lambda x: str(round(x*freq.sum()/100)) + "; " + str(round(x,1)) + "%") plt.show return