# Fundamentals

## What is statistics?

**(if you prefer to watch a video on this than read, click here)**

Perhaps one of the oldest question humans have is ‘how many?’ or ‘how much?’. How much food do we have, how many likes do I have, how many visitors, how much money, etc. etc. Not surprisingly head of states (as in countries) were also very interested in the answer to a lot of these kind of questions. How many people live in my state, how many male, how many female, how many children born, how big is my state, etc. Early statistics dictionaries did not contain statistical terms, but data about states. An example is shown in Figure 1. It is from a dictionary of 1884.

**Figure 1**

*Part of Mulhall’s dictionary of statistics from 1884*

In Figure 1 you can see the ‘able bodied’ percentages of different countries, where able bodied means men capable of bearing arms (= able to go to war). After a while ideas emerged on not only how to collect data, but also on how to best display it, and later of course also how to analyse this. That is what **statistics** is all about. It can be defined as: “the science of collecting, displaying and analysing data” (Upton & Cook, 2014, p. 429). Note that the term statistics still has ‘state’ in it, but these days it is no longer only used for data about states but in all kinds of fields.

There are two main branches of statistics; descriptive and inferential (Wright & London, 2009, p. 55). **Descriptive statistics** are: “methods for organizing, displaying, and describing data using tables, graphs and summary measures” (Mann, 1991, 2010, p. 3). Descriptive statistics is the type of statistics most people encounter every day, often without realizing they are looking at statistics. In many video-games results or scores are often displayed in various charts or with various comparisons, advertisements try to show of fancy diagrams and in business reports the tables and diagrams also play often a key role.

When collecting data you are usually interested in a specific group of people, animals or things, but don’t have the time (or money) to collect data about all of them. The entire group is known as the **population**: “the complete set of objects of interest” (Upton & Cook, 2014, p. 332). The people/things you actually got data from is then known as a **sample**: “a subset of a population usually chosen in such a way that it can be taken to represent the population with respect to some characteristic” (Upton & Cook, 2014, p. 379).

Wouldn’t it be great if you could say something about the population based on just one sample? It would save a lot of time and/or money. A fancy word for making a statement about a population based on a sample is an **inference**: “a conclusion about a population based on logical reasoning from data gathered about a smaller sample” (Zedeck, 2014, p. 175).

As it turns out, this is possible to do and therefore known as inferential statistics. When mathematicians started developing probability theory around 1700 they lay the foundations for this and later around 1850 the techniques that we use today were developed. **Inferential statistics** could therefore be defined as the field of statistics that tries to say something about a population, based on a sample from that population.

It is important though how you obtain your sample. Different methods exist to do this and these are known as **sampling** techniques: “the process of selecting a limited number of units from a larger set for a study” (Zedeck, 2014, p. 322).

Figure 2 shows how the terms population, sampling, sample, and inference relate to each other.

**Figure 2**

*Relation between population, sampling, sample and inference*

Especially in the inferential statistics a lot of calculations are done, and with it come many formulas. These formulas are also implemented in various statistical software programs (like R, SPSS, MiniTab, and also Excel). I would call Statistics that deals with the formula itself, the calculation and the proof, **mathematical statistics**, while if the focus is on the interpretation of the result, I’d call it **applied statistics**.

Unfortunately for those who dislike calculations, in order to interpret the results of a statistical analysis, it is sometimes necessary to understand the calculation. Don’t worry I’ll try to explain calculations in small steps.

Statistics is all about data, so we might want to discuss a few terms connected to data itself, this will be the focus of the next section.

**Fundamentals**

Google adds