Terms about data
(if you prefer to watch a video on this than read, click here)
Since statistics is all about data, we need to discuss a few terms that are linked to data in general and which will come up throughout this course every now and again.
The first term is used to indicate what or whom you are collecting data about, and known as cases (or sometimes units, or observations). With a survey you collect data about respondents, so respondents will be your cases. If you would collect the average daily temperature in your city, then every day will be a case, if you collect the revenue of various companies in your industry in 2016, then every company is a case. Cases are therefore a more general term than respondents. Respondents are always cases, but cases are not always respondents.
From each case we want to know a few things. These things we want to know are called variables: the things we would like to know from each case.
To illustrate this, let’s imagine we have the following survey:
What is your gender? O Male O Female
What is your age? ______(year)
What is your name? ____________
What do you think of statistics? O very boring O boring O interesting O very interesting
In this survey each question is a variable. Usually the variable is given an abbreviation, in the example survey these might be gender, age, name and opinion.
Note that a variable is something that can vary. The possible variations of a variable are known as the values. For the variables in the example survey the values for gender would be ‘male’ and ‘female’, for age any number above 0, for name all possible names and for opinion the options ‘very boring’, ‘boring’, ‘interesting’ and ‘very interesting’.
These values are often assigned a letter or a number, e.g. m = male and f = female, or 0 = male and 1 = female. This is known as coding: assigning a letter(s) or number(s) to represent the value. For open ended questions no coding is required.
In Table 1 the conversion of the example survey into variables, values and coding. This is sometimes referred to as a codebook.
|Variable name||Variable description||Values||Coding|
|Gender||Male and female||0 = male
1 = female
|Age||Age in years||>0|
|Opinion||Opinion on statistics course||very boring, boring, interesting, very interesting||1 = very boring
2 = boring
3 = interesting
4 = very interesting
Note that the variable description is either a longer description of the name of the variable, or sometimes the entire question as it was asked on the survey.
The last term connected to this is a score. A score is the value (or assigned code) for a single case on a single variable. For example, since I’m male, my score on the variable gender in the example survey would be a 0. In Table 2 my other scores on the survey.
Now that we are familiar with the terms connected to data we can move on to the next important fundamental part which is measurement levels, the topic for the next section.