Module `stikpetP.visualisations.vis_pie`

Expand source code

import matplotlib.pyplot as plt
import pandas as pd

def vi_pie(data, labels="count"):  
    '''
    Pie Chart
    ---------
    
    A pie-chart is a “graphic display in which a circle is cut into wedges with the area of each wedge being proportional to the percentage of cases in the category represented by that wedge” (Zedeck, 2014, p. 260). 
    
    A video on pie charts is available [here](https://youtu.be/e6JtJsh-6iw).

    This function is shown in this [YouTube video](https://youtu.be/oP4Tyl3u5Vc) and the visualisation is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/PieChart.html)
    
    Parameters
    ----------
    data : list or pandas series
    labels : {'count', 'percent', 'both', 'none'}, optional,
        what to show besides the labels
    
    Notes
    -----
    It is possible to either show only the labels (label="none"), the counts (label="count"), the percentages (label="percent"), or both count and percent (label="both").
    
    The function uses the matplotlib pyplot library *plot* function, rotated and counter clockwise.
    
    The pie-chart is quite popular and often used, but actually has a few disadvantages. It can only show relative frequencies. To show other frequencies the numbers themselves have to be added. A circle has 360 degrees, equal to 100%. So by multiplying the relative frequencies with 360, the degrees for each category can be found. This means that visually the pie-chart can only show the relative frequencies.
    
    Another disadvantage is when the relative frequencies are close to each other, the differences are not easily seen in a circle diagram.
    
    As a third disadvantage, when there are many categories the circle diagram will look very busy and not easily to read.
    
    People also have more difficulty with comparing areas and angles (what you do when looking at a pie-chart) than comparing heights (what is done with a bar-chart).
    
    Also often a 3D effect is added, but this actually makes comparisons of the slices even more difficult.
    
    The earliest found circle diagram is found on the inlay of a book by William Playfair (1801).The name 'pie chart' might come from a misspelling of the word Pi. Pi is often associated with a circle. It might also simply come from the resemblances with a pie (as in apple-pie). However Srivastava and Rego (2011) put forward another belief that it is named after a royal French cook Pie, who served dishes in a pie-chart shape.

    See Also
    --------
    Before the visualisation you might first want to get an impression using a frequency table:
    * [tab_frequency](../other/table_frequency.html#tab_frequency)

    After visualisation you might want some descriptive measures:
    * [me_mode](../measures/meas_mode.html#me_mode) for the mode
    * [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation

    or perform a test:
    * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
    * [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit
    * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
    * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
    * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
    * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
    * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
    * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
    
    Alternatives for this visualisation could be:
    * [vi_bar_simple](../visualisations/vis_bar_simple.html#vi_bar_simple) for Simple Bar Chart
    * [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot
    * [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot
    * [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart
    
    References
    ----------
    Playfair, W. (1801). *The statistical breviary: Shewing the resources of every state and kingdom*. T. Bensley. http://archive.org/details/statisticalbrev00playgoog
    
    Srivastava, T. N., & Rego, S. (2011). *Business research methodology*. Tata McGraw-Hill.
    
    Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: pandas series
    >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['mar1']
    >>> vi_pie(ex1);
    >>> vi_pie(ex1, labels="percent");
    >>> vi_pie(ex1, labels="none");
    >>> vi_pie(ex1, labels="both");
    
    Example 2: a list
    >>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
    >>> vi_pie(ex2);
    
    '''
    
    
    if type(data) == list:
        data = pd.Series(data)
        
    freq = data.value_counts()
    
    if labels=="none":
        freq.plot(kind='pie', ylabel="", startangle=90)
    elif labels=="percent":
        freq.plot(kind='pie', ylabel="", startangle=90, autopct='%1.1f%%')
    elif labels=="count":
        freq.plot(kind='pie', ylabel="", startangle=90, autopct=lambda x: str(round(x*freq.sum()/100)))
    elif labels=="both":
        freq.plot(kind='pie', ylabel="", startangle=90, autopct=lambda x: str(round(x*freq.sum()/100)) + "; " + str(round(x,1)) + "%")
        
    plt.show
    
    return

Functions

def vi_pie(data, labels='count')

Pie Chart

A pie-chart is a “graphic display in which a circle is cut into wedges with the area of each wedge being proportional to the percentage of cases in the category represented by that wedge” (Zedeck, 2014, p. 260).

A video on pie charts is available here.

This function is shown in this YouTube video and the visualisation is also described at PeterStatistics.com

Parameters

data : list or pandas series
labels : {'count', 'percent', 'both', 'none'}, optional,: what to show besides the labels

Notes

It is possible to either show only the labels (label="none"), the counts (label="count"), the percentages (label="percent"), or both count and percent (label="both").

The function uses the matplotlib pyplot library plot function, rotated and counter clockwise.

The pie-chart is quite popular and often used, but actually has a few disadvantages. It can only show relative frequencies. To show other frequencies the numbers themselves have to be added. A circle has 360 degrees, equal to 100%. So by multiplying the relative frequencies with 360, the degrees for each category can be found. This means that visually the pie-chart can only show the relative frequencies.

Another disadvantage is when the relative frequencies are close to each other, the differences are not easily seen in a circle diagram.

As a third disadvantage, when there are many categories the circle diagram will look very busy and not easily to read.

People also have more difficulty with comparing areas and angles (what you do when looking at a pie-chart) than comparing heights (what is done with a bar-chart).

Also often a 3D effect is added, but this actually makes comparisons of the slices even more difficult.

The earliest found circle diagram is found on the inlay of a book by William Playfair (1801).The name 'pie chart' might come from a misspelling of the word Pi. Pi is often associated with a circle. It might also simply come from the resemblances with a pie (as in apple-pie). However Srivastava and Rego (2011) put forward another belief that it is named after a royal French cook Pie, who served dishes in a pie-chart shape.

References

Playfair, W. (1801). The statistical breviary: Shewing the resources of every state and kingdom. T. Bensley. http://archive.org/details/statisticalbrev00playgoog

Srivastava, T. N., & Rego, S. (2011). Business research methodology. Tata McGraw-Hill.

Zedeck, S. (Ed.). (2014). APA dictionary of statistics and research methods. American Psychological Association.

Author

Made by P. Stikker

Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076

Examples

Example 1: pandas series

>>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df1['mar1']
>>> vi_pie(ex1);
>>> vi_pie(ex1, labels="percent");
>>> vi_pie(ex1, labels="none");
>>> vi_pie(ex1, labels="both");

Example 2: a list

>>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
>>> vi_pie(ex2);

Expand source code

def vi_pie(data, labels="count"):  
    '''
    Pie Chart
    ---------
    
    A pie-chart is a “graphic display in which a circle is cut into wedges with the area of each wedge being proportional to the percentage of cases in the category represented by that wedge” (Zedeck, 2014, p. 260). 
    
    A video on pie charts is available [here](https://youtu.be/e6JtJsh-6iw).

    This function is shown in this [YouTube video](https://youtu.be/oP4Tyl3u5Vc) and the visualisation is also described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/PieChart.html)
    
    Parameters
    ----------
    data : list or pandas series
    labels : {'count', 'percent', 'both', 'none'}, optional,
        what to show besides the labels
    
    Notes
    -----
    It is possible to either show only the labels (label="none"), the counts (label="count"), the percentages (label="percent"), or both count and percent (label="both").
    
    The function uses the matplotlib pyplot library *plot* function, rotated and counter clockwise.
    
    The pie-chart is quite popular and often used, but actually has a few disadvantages. It can only show relative frequencies. To show other frequencies the numbers themselves have to be added. A circle has 360 degrees, equal to 100%. So by multiplying the relative frequencies with 360, the degrees for each category can be found. This means that visually the pie-chart can only show the relative frequencies.
    
    Another disadvantage is when the relative frequencies are close to each other, the differences are not easily seen in a circle diagram.
    
    As a third disadvantage, when there are many categories the circle diagram will look very busy and not easily to read.
    
    People also have more difficulty with comparing areas and angles (what you do when looking at a pie-chart) than comparing heights (what is done with a bar-chart).
    
    Also often a 3D effect is added, but this actually makes comparisons of the slices even more difficult.
    
    The earliest found circle diagram is found on the inlay of a book by William Playfair (1801).The name 'pie chart' might come from a misspelling of the word Pi. Pi is often associated with a circle. It might also simply come from the resemblances with a pie (as in apple-pie). However Srivastava and Rego (2011) put forward another belief that it is named after a royal French cook Pie, who served dishes in a pie-chart shape.

    See Also
    --------
    Before the visualisation you might first want to get an impression using a frequency table:
    * [tab_frequency](../other/table_frequency.html#tab_frequency)

    After visualisation you might want some descriptive measures:
    * [me_mode](../measures/meas_mode.html#me_mode) for the mode
    * [me_qv](../measures/meas_qv.html#me_qv) for Measures of Qualitative Variation

    or perform a test:
    * [ts_pearson_gof](../tests/test_pearson_gof.html#ts_pearson_gof) for Pearson Chi-Square Goodness-of-Fit Test
    * [ts_freeman_tukey_gof](../tests/test_freeman_tukey_gof.html#ts_freeman_tukey_gof) for Freeman-Tukey Test of Goodness-of-Fit
    * [ts_freeman_tukey_read](../tests/test_freeman_tukey_read.html#ts_freeman_tukey_read) for Freeman-Tukey-Read Test of Goodness-of-Fit
    * [ts_g_gof](../tests/test_g_gof.html#ts_g_gof) for G (Likelihood Ratio) Goodness-of-Fit Test
    * [ts_mod_log_likelihood_gof](../tests/test_mod_log_likelihood_gof.html#ts_mod_log_likelihood_gof) for Mod-Log Likelihood Test of Goodness-of-Fit
    * [ts_multinomial_gof](../tests/test_multinomial_gof.html#ts_multinomial_gof) for Multinomial Goodness-of-Fit Test
    * [ts_neyman_gof](../tests/test_neyman_gof.html#ts_neyman_gof) for Neyman Test of Goodness-of-Fit
    * [ts_powerdivergence_gof](../tests/test_powerdivergence_gof.html#ts_powerdivergence_gof) for Power Divergence GoF Test
    
    Alternatives for this visualisation could be:
    * [vi_bar_simple](../visualisations/vis_bar_simple.html#vi_bar_simple) for Simple Bar Chart
    * [vi_cleveland_dot_plot](../visualisations/vis_cleveland_dot_plot.html#vi_cleveland_dot_plot) for Cleveland Dot Plot
    * [vi_dot_plot](../visualisations/vis_dot_plot.html#vi_dot_plot) for Dot Plot
    * [vi_pareto_chart](../visualisations/vis_pareto_chart.html#vi_pareto_chart) for Pareto Chart
    
    References
    ----------
    Playfair, W. (1801). *The statistical breviary: Shewing the resources of every state and kingdom*. T. Bensley. http://archive.org/details/statisticalbrev00playgoog
    
    Srivastava, T. N., & Rego, S. (2011). *Business research methodology*. Tata McGraw-Hill.
    
    Zedeck, S. (Ed.). (2014). *APA dictionary of statistics and research methods*. American Psychological Association.
    
    Author
    ------
    Made by P. Stikker
    
    Companion website: https://PeterStatistics.com  
    YouTube channel: https://www.youtube.com/stikpet  
    Donations: https://www.patreon.com/bePatron?u=19398076
    
    Examples
    --------
    Example 1: pandas series
    >>> df1 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/GSS2012a.csv', sep=',', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
    >>> ex1 = df1['mar1']
    >>> vi_pie(ex1);
    >>> vi_pie(ex1, labels="percent");
    >>> vi_pie(ex1, labels="none");
    >>> vi_pie(ex1, labels="both");
    
    Example 2: a list
    >>> ex2 = ["MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED"]
    >>> vi_pie(ex2);
    
    '''
    
    
    if type(data) == list:
        data = pd.Series(data)
        
    freq = data.value_counts()
    
    if labels=="none":
        freq.plot(kind='pie', ylabel="", startangle=90)
    elif labels=="percent":
        freq.plot(kind='pie', ylabel="", startangle=90, autopct='%1.1f%%')
    elif labels=="count":
        freq.plot(kind='pie', ylabel="", startangle=90, autopct=lambda x: str(round(x*freq.sum()/100)))
    elif labels=="both":
        freq.plot(kind='pie', ylabel="", startangle=90, autopct=lambda x: str(round(x*freq.sum()/100)) + "; " + str(round(x,1)) + "%")
        
    plt.show
    
    return