Module stikpetP.visualisations.vis_boxplot_single
Expand source code
import pandas as pd
import matplotlib.pyplot as plt
def vi_boxplot_single(data, varname=None):
'''
Box (and Whisker) Plot
----------------------
A box plot is a little more complex visualisation than a histogram. It shows the five quartiles (e.g. minimum, 1st quartile, median, 3rd quartile, and maximum). It can also be adjusted to show so-called outliers.
This function is shown in this [YouTube video](https://youtu.be/a0Vu6kL_WYo) and the visualisation is described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/boxPlot.html)
Parameters
----------
data : list or pandas series
the numeric data
varname : string, optional
name to display on vertical axis
Notes
-----
This was actually a 'range chart' (Spear, 1952, p. 166) but somehow it is these days referred to as a box-and-whisker plot as named by Tukey (1977, p. 39)
The function uses the **boxplot()** function from the *pandas* library. If you want to modify more things (like colors etc.) you might want to use that function.
Before, After and Alternatives
------------------------------
Before this you might want to create a binned frequency table with [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins).
After this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation
Or a perform a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test.
Alternative Visualisations are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram
References
----------
Spear, M. E. (1952). *Charting statistics*. McGraw-Hill.
Tukey, J. W. (1977). *Exploratory data analysis*. Addison-Wesley Pub. Co.
Author
------
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076
Examples
---------
Example 1: pandas series
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'})
>>> ex1 = df2['Gen_Age']
>>> vi_boxplot_single(ex1);
Example 2: Numeric list
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5]
>>> vi_boxplot_single(ex2);
'''
if type(data) == list:
data = pd.Series(data)
data = data.dropna()
data = pd.to_numeric(data)
plt.boxplot(data.dropna(), orientation='horizontal')
plt.ylabel(varname)
plt.yticks([1], " ")
plt.show()
return
Functions
def vi_boxplot_single(data, varname=None)
-
Box (and Whisker) Plot
A box plot is a little more complex visualisation than a histogram. It shows the five quartiles (e.g. minimum, 1st quartile, median, 3rd quartile, and maximum). It can also be adjusted to show so-called outliers.
This function is shown in this YouTube video and the visualisation is described at PeterStatistics.com
Parameters
data
:list
orpandas series
- the numeric data
varname
:string
, optional- name to display on vertical axis
Notes
This was actually a 'range chart' (Spear, 1952, p. 166) but somehow it is these days referred to as a box-and-whisker plot as named by Tukey (1977, p. 39)
The function uses the boxplot() function from the pandas library. If you want to modify more things (like colors etc.) you might want to use that function.
Before, After and Alternatives
Before this you might want to create a binned frequency table with tab_frequency_bins.
After this you might want some descriptive measures. Use me_mode_bin for Mode for Binned Data, me_mean for different types of mean, and/or me_variation for different Measures of Quantitative Variation
Or a perform a test. Various options include ts_student_t_os for One-Sample Student t-Test, ts_trimmed_mean_os for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or ts_z_os for One-Sample Z Test.
Alternative Visualisations are vi_boxplot_single for a Box (and Whisker) Plot and vi_histogram for a Histogram
References
Spear, M. E. (1952). Charting statistics. McGraw-Hill.
Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley Pub. Co.
Author
Made by P. Stikker
Companion website: https://PeterStatistics.com
YouTube channel: https://www.youtube.com/stikpet
Donations: https://www.patreon.com/bePatron?u=19398076Examples
Example 1: pandas series
>>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df2['Gen_Age'] >>> vi_boxplot_single(ex1);
Example 2: Numeric list
>>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> vi_boxplot_single(ex2);
Expand source code
def vi_boxplot_single(data, varname=None): ''' Box (and Whisker) Plot ---------------------- A box plot is a little more complex visualisation than a histogram. It shows the five quartiles (e.g. minimum, 1st quartile, median, 3rd quartile, and maximum). It can also be adjusted to show so-called outliers. This function is shown in this [YouTube video](https://youtu.be/a0Vu6kL_WYo) and the visualisation is described at [PeterStatistics.com](https://peterstatistics.com/Terms/Visualisations/boxPlot.html) Parameters ---------- data : list or pandas series the numeric data varname : string, optional name to display on vertical axis Notes ----- This was actually a 'range chart' (Spear, 1952, p. 166) but somehow it is these days referred to as a box-and-whisker plot as named by Tukey (1977, p. 39) The function uses the **boxplot()** function from the *pandas* library. If you want to modify more things (like colors etc.) you might want to use that function. Before, After and Alternatives ------------------------------ Before this you might want to create a binned frequency table with [tab_frequency_bins](../other/table_frequency_bins.html#tab_frequency_bins). After this you might want some descriptive measures. Use [me_mode_bin](../measures/meas_mode_bin.html#me_mode_bin) for Mode for Binned Data, [me_mean](../measures/meas_mean.html#me_mean) for different types of mean, and/or [me_variation](../measures/meas_variation.html#me_variation) for different Measures of Quantitative Variation Or a perform a test. Various options include [ts_student_t_os](../tests/test_student_t_os.html#ts_student_t_os) for One-Sample Student t-Test, [ts_trimmed_mean_os](../tests/test_trimmed_mean_os.html#ts_trimmed_mean_os) for One-Sample Trimmed (Yuen or Yuen-Welch) Mean Test, or [ts_z_os](../tests/test_z_os.html#ts_z_os) for One-Sample Z Test. Alternative Visualisations are [vi_boxplot_single](../visualisations/vis_boxplot_single.html#vi_boxplot_single) for a Box (and Whisker) Plot and [vi_histogram](../visualisations/vis_histogram.html#vi_histogram) for a Histogram References ---------- Spear, M. E. (1952). *Charting statistics*. McGraw-Hill. Tukey, J. W. (1977). *Exploratory data analysis*. Addison-Wesley Pub. Co. Author ------ Made by P. Stikker Companion website: https://PeterStatistics.com YouTube channel: https://www.youtube.com/stikpet Donations: https://www.patreon.com/bePatron?u=19398076 Examples --------- Example 1: pandas series >>> df2 = pd.read_csv('https://peterstatistics.com/Packages/ExampleData/StudentStatistics.csv', sep=';', low_memory=False, storage_options={'User-Agent': 'Mozilla/5.0'}) >>> ex1 = df2['Gen_Age'] >>> vi_boxplot_single(ex1); Example 2: Numeric list >>> ex2 = [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5] >>> vi_boxplot_single(ex2); ''' if type(data) == list: data = pd.Series(data) data = data.dropna() data = pd.to_numeric(data) plt.boxplot(data.dropna(), orientation='horizontal') plt.ylabel(varname) plt.yticks([1], " ") plt.show() return