# Analysing a single scale variable

## 1b: Visualisation

In the previous part we got a first impression of the data, but it might be good to also visualise the results. Two commonly diagrams that could be used is the histogram and a box plot. I will explain use the histogram here, but for the interested reader the box plot is discussed in appendix at the bottom of this page.

To illustrate a histogram we start with showing what we would get if we simply drew a bar-chart of the scale variable as shown in Figure 1.

**Figure 1.**

*Bar chart of a scale variable*

In the bar-chart for each chosen age a bar is drawn with the height of the counts. In section 2.2 the bar-chart was discussed in more detail, but there are so many bars that for a scale variable this often is not very insightful. To reduce the number of bars, the scale variable is often recoded into categories (bins), as we also did in the previous section. If the bins are then of equal width (size) we get the bar-chart shown in Figure 2.

**Figure 2.**

*Bar chart of a binned scale variable*

To emphasize that we actually have the original scores, and did not ask on the survey for the age category (but really simply their age), the bars are placed next to each other and the horizontal scale becomes a number line. This chart is then no longer called a bar-chart, but a histogram. Figure 3 show the histogram of the age of the respondents.

**Figure 3.**

*Histogram of scale variable.*

**Click here to see how to create a simple histogram...**

**with Excel**

Two videos, one on how to create a simple histogram when bin sizes are equal, and one for unequal

*equal class widths Excel 2007-2013*

Excel file from video: VI - Histogram (single).xlsx.

*equal class widths Excel 2016-2019*

Excel file from video: VI - Histogram (single).xlsx.

*unequal class widths*

Excel file from video: VI - Histogram (single).xlsx.

**with Python**

Jupyter Notebook used in video: VI - Histogram.ipynb.

Data file used in video and notebook GSS2012a.csv.

**with R (Studio)**

R script used in video: VI - Histogram.R.

Datafile used in video: GSS2012-Adjusted.sav

**with SPSS**

There are a four different ways to create a histogram with SPSS.

*using Chart Builder*

watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).

Datafile used in video: Results.sav

*using Legacy Dialogs*

watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).

Datafile used in video: Results.sav

*using Frequencies*

watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).

Datafile used in video: Results.sav

*using Explore*

watch the video below, or download the pdf instructions (via bitly, opens in new window/tab).

Datafile used in video: Results.sav

Originally a histogram should also make use of something known as frequency densities (Pearson, 1895, p. 399), but if you keep the width of each bin the same, this can be ignored.

When showing a chart it is good to also talk a little bit about it. For a scale variable you might want to describe the shape of the histogram. It of course always depends on your specific data, but inform your reader what you notice from the graph or what you want to show.

An alternative for a histogram that is sometimes used is a box-plot. This diagram however requires knowledge of quartiles, and outliers. Although a box-plot might in some cases be a better diagram to use then a histogram, you should also wonder if your reader(s) will understand a box-plot. A histogram is often easily understood, by people but a box-plot isnâ€™t. In the appendix below you can read more on box plots

Besides a frequency table and it's visualisation, we can also use some measurements to describe the data. This is the topic for the next section.

**Appendix: Box-and-Whiskers Plot**

A box plot is a little more complex visualisation than a histogram. An example is shown in figure 4.

**Figure 4.**

*Example of a Box Plot*

It shows the five quartiles (e.g. minimum, 1st quartile, median, 3rd quartile, and maximum). It can also be adjusted to show so-called outliers.

The first quartile is the point for which 25% of all scores is less or equal, the median is 50% and the third quartile 75%. Note that for the calculation of the quartiles various methods exist (see https://mathworld.wolfram.com/Quartile.html).

To create the box plot, a 'box' is drawn with the 1st and 3rd quartile at either end. A line in the box is drawn at the median. Then from the middle of each end a line (whisker) is drawn to the maximum and minimum. This was actually a 'range chart' (Spear, 1952, p. 166) but somehow it is these days referred to as a box-and-whisker plot as named by Tukey (1977, p. 39)

Often values that are more than 1.5 times the inter-quartile range (iqr) above the 3rd quartile, or below the 1st are shown as a dot or asterisks, and the whiskers are then drawn till the first value that still falls within this 1.5 times iqr limit.

The inter-quartile range is simply the 3rd quartile minus the first.

In figure 5 each element is indicated.

**Figure 5.**

*Annotated Box Plot*

Note that in each of the main segments there is 25% of the data. So a longer segment would indicate that the data in that segment is more spread out. To visualise this a small animation is shown in figure 6.

**Figure 6.**

*Box-Plot to Histogram Animation*

The animation starts with 24 data points represented as circles. Since for a box-plot each segment should have 25% of the data, we place 24/4 = 6 circles in each segment. Then draw a rectangle around each segment and you have a histogram from the box plot.

**Click here to see how to create a box-and-whiskers plot...**

**with Excel**

The easiest method with Excel is to draw a box-plot vertically, but if you must have it horizontally, it can be done

*Vertical*

Excel file from video: VI - Box Plot (single).xlsx.

*Horizontal, using stacked bar trick*

Excel file from video: VI - Box Plot (single).xlsx.

*Horizontal, using scatterplot trick*

Excel file from video: VI - Box Plot (single).xlsx.

**with Python**

video to be uploaded

Jupyter Notebook: VI - Box Plot (single).ipynb.

Data file used: GSS2012a.csv.

**with R (Studio)**

video to be uploaded

R script: VI - Box Plot (Single).R.

Data file used: GSS2012a.csv.

**with SPSS**

A box-plot of a single variable can either be made using the Chart Builder, or the Legacy Dialogs. It doesn't really matter which you use.

*via Chart Builder*

Data file: GSS2012-Adjusted.sav

*via Legacy Dialogs*

Data file: GSS2012-Adjusted.sav

**Single scale variable**

Google adds