# Analysing a single scale variable

## 1a: impression from sample data

All the frequency types discussed for a nominal and ordinal variable, can also be applied for a scale variable. One complication however is that for a scale variable, there are often so many options that the table becomes very long. Since the point of a table is to give a clear overview, and a long table often isn’t very clear, this creates a problem. The solution is to create bins (or classes) as shown in Table 1.

Age | Frequency |
---|---|

15 < 25 | 170 |

25 < 35 | 363 |

35 < 45 | 358 |

45 < 55 | 360 |

55 < 65 | 324 |

65 < 75 | 222 |

75 < 85 | 125 |

85 < 95 | 47 |

The table shows that there were 170 respondents in the age bin of 15 < 25. The symbol '<' is used for 'but under', so someone of the age of 25 would fit into 25 < 35, but not in 15 < 25. Sometimes ≤ is used, which stands for 'equal or less than'. A more technical method is the use of [ or ] to indicate ‘including’ and ( or ) to indicate smaller than. The interval 15 < 25 is then the same as [15,25), and the interval 15 ≤ 24 is the same as [15,24]. Another symbol often used is a hyphen (-). It is however sometimes used as < (Chaudhary, Kumar, & Alka, 2009; Sharma, 2007), and sometimes as ≤ (Beri, 2010; Haighton, Haworth, & Wake, 2003).

The lower end of a bin is called the **lower bound** and the upper end the **upper bound** (e.g. the bin 15 < 25 has as a lower bound 15 and as an upper bound 25).

When creating these bins two important rules should be met:

- bins should not overlap.

So do not use 15 < 25 and 20 < 35, since a person who is then 22 years would fit into both. This sometimes goes wrong when people use ≤ instead of <. - Each score should fit into a bin.

This means that the lower bound of the first bin should be smaller than the lowest score, and the upper bound of the last bin should be higher than the highest score.

These two rules can be combined into one: each score should fit into exactly one bin.

There are also various formulas to help on deciding how many bins you should use, or how wide each bin should be. This is important because depending on how the bins are setup the results might look different. There are some formulas that can be used to determine the number of bins (e.g. **Sturges’ rule** (Sturges, 1926, p. 65), or **Square Root Choice** (Duda & Hart, 1973), and some authors simply use ‘a rule of thumb. One such rule of thumb is from Herkenhoff and Fogli (2013, p. 58) who recommend between 5 and 15 bins. Anything more than 15 might cause the table to become unclear (which is exactly what we are trying to avoid) and with anything less than 5 we might lose too much information.
By creating bins we lose some information since we don’t see exactly anymore what for example the ages were of the 170 people in the 15 < 25 bin.

By binning a scale variable, we actually convert it into an ordinal variable, and we could use all the types of frequencies discussed there as well.
If the bin sizes are all the same this is all fine, but if some bin sizes are different then others it might actually distort the truth. If bin sizes are not equal, we should actually use something known as ‘**frequency density**’. This is discussed in the appendix below.

A visualisation of the sample data might also give a good impression. This is discussed in the next section.

**Appendix**

<Frequency density details to be uploaded, but for now check the full site here>

**Single scale variable**

Google adds