(if you prefer to watch a video on this than read, click here)
Perhaps the most important term from the inferential statistics is the term significance, which can be defined as: the probability of a result as in the sample, or even more extreme, if the assumption about the population would be true (note: the font colors are added for reference later in an example).
This definition is often a bit difficult to fully understand, but it is a term that will come back very often. I’ll try to explain it in some details. What is important here is that with a survey you only have taken one sample out of many possible samples that could have been taken out of a population. What we would like to know is; what can be said about the entire population based on a single sample from that population (without having to take all possible samples)? To answer this, one approach is to use the significance. First the definition will be cleared up with an example, and then some discussion on how to interpret this chance.
To illustrate significance, let's use an example. I have two coins (A and B). I’ve flipped each coin 200 times. Coin A resulted in 190 times head (and 10 times tail), Coin B resulted in 92 times head (and 108 times tail). Which coin do you think might be fair (fair is an equal chance of head and tail)?
Most people would go for Coin B. The reasoning would go as follows:
- If a coin is fair and we throw it infinite times, we should have an equal number of heads as tails. The chance for head is then 0.5 (50%).
- If we throw a fair coin 200 times, we’d expect around 50% times head, so about 100.
- A small deviation of this expected 100 is likely to occur, but not a large one.
- With Coin A we have a deviation of (190 – 100 =) 90 from the expected 100, while with Coin B we have a deviation of (100 – 92 =) 8.
- Most people would consider 90 a large deviation in this example, and 8 a small one.
- Most people would consider therefore Coin B to be most likely fair.
In this example, throwing the coin infinite times (point 1) would be the population. Our assumption about the population, is that the coin is fair. The 90 and 8 from point 4 were our ‘result as in the sample’. Now the chances for a deviation of exactly 90 is very low, and also the chance for a deviation of exactly 8 is very low. We are actually interested in the chance of a deviation of 90 or more, and of a deviation of 8 or more. This is the ‘or even more extreme’ in the definition of significance.
How these chances are calculated is not so important right now, but the chance of a deviation of 8 or more is almost 30%, while the chance of a deviation of 90 or more is almost 0%. The deviation of 8 or more is likely to occur if the coin is fair, but 90 or more isn’t. It does not mean that a deviation of 90 or more is completely not possible, it is just very unlikely if the coin is fair.
The chances (the 30% and near 0%) are the significances. It is the chance of a deviation as in the sample, or even more extreme, if the coin is fair.
Interpretation of significance
If this probability is very low it would mean that we have been either very 'lucky' and just picked one of the few samples that has this result or perhaps the assumption about the population is wrong. In statistics usually, we will assume the assumption is wrong, if the probability is below 5% (0.05). In some cases, the threshold is placed at another value (1% and 10% are also sometimes used).
Note that we allow ourselves a 5% risk of making the wrong decision. It could be that the assumption about the population is true but becomes unlikely. This would be known as a type I error; rejecting the assumption about the population, although it is actually true. We also have a risk the other way around. In case the probability is above 0.05 we could be not rejecting the assumption about the population, although it is actually false. This is known as a type II error.
The 'assumption about the population' is also known as the 'null hypothesis' and should always be formulated in such a way that things are equal, or have no difference, or no relation. Referring back to the coin example, we cannot test if the coin is unfair (i.e. the deviation is not equal to 0), only if it is fair (i.e. the deviation is equal to 0). Strictly speaking we also never accept the assumption, we either claim 'there was insufficient evidence to reject the assumption', or to reject it.
The beauty that the significance can actually be calculated is often underappreciated. How it is done is a bit too technical for this crash course where we focus on the interpretation of the results, not so much on the computations.