Analysing a binary vs. ordinal variable
1a: Impression of the data (cross table)
A table showing the results of two variables in one, is known as a cross table or sometimes called a contingency table or a cross tabulation. The term contingency table was introduced by Pearson (1904, p. 34), using the term contingency as an alternative for association (relation). A cross table is indeed a useful tool to start exploring possible relations between two variables. An example is shown in Table 1.
Female Count |
Female Percent |
Male Count |
Male Percent |
Total Count |
Total Percent |
||
---|---|---|---|---|---|---|---|
Valid | Far too little | 1 |
9% |
1 |
3% |
2 |
4% |
Too little | 2 |
18% |
2 |
6% |
4 |
9% |
|
Enough | 8 |
73% |
15 |
44% |
23 |
51% |
|
Too much | 0 |
0% |
14 |
41% |
14 |
31% |
|
Far too much | 0 |
0% |
2 |
6% |
2 |
4% |
|
Total | 11 |
100% |
34 |
100% |
45 |
100% |
Click here to see how you can create a cross table as above, with SPSS, with R, or with Excel.
with SPSS
There are a two different ways to create a cross table with SPSS.
using Crosstabs
using Custom Tables
with R
with Excel
In this example the so-called independent variable is placed in the columns. An independent variable is, as the names implies, a variable that does not depend on another variable. In the example gender will not likely depend on someone’s opinion, so gender is the independent variable. The other variable (5.3 The amount of…) is the dependent variable, since this might depend on someone’s gender.
My suggestion would be as follows. If you have one variable that will not fit into the columns then it should be placed in the rows, otherwise if you have a clear independent variable and a dependent variable, place the independent variable in the columns. In all other cases, do what you want :D. See at the bottom of this page the note for more details on this.
In a cross table we have three type of totals: the row total, the column total and the grand total (also known as table total). This means we can also calculate relative frequencies (percentages) in three different ways. Using the Female-Far too little cell as an example, we could say that this is 1/2 x 100 = 50% using the row total. This means that 50% of those who have chosen ‘far too little’ in the survey are female. We could also say that it is 1 / 11 x 100 ˜ 9%, using the column total. This means that 9% of the females in the survey indicated ‘far too little’. We could also say that 1 / 45 x 100 ˜ 2%, using the grand total. This means that 2% of all the respondents indicated to be female and ‘far too little’. If you have an independent variable, you often want to compare the results of one of them with the other. It is therefore recommended to use the total of the independent variable (usually in the columns) for the calculation.
From Table 1 we can note that there were no females who indicated there was too much (or far too much), but there were some males who thought there was too much and even two far too much. It might help to visualise the result, so this will be discussed in the next section.
Notes on the construction of a cross table
When constructing a cross table as above the first decision is to choose which variable will be in the columns, and which will be representing the rows. Some textbooks will decide on this based on what the independent and what the dependent variable is. As the names imply an independent variable is a variable that does not depend on something, and the dependent variable does (Porkess, 1991, p. 64). In the example table it is unlikely that your gender will depend on your opinion on question 5.3, but it could be that your opinion on question 5.3 depends on your gender. Gender is therefor in the example the independent variable, and ‘5.3 The amount…’ the dependent. Demo-graphical variables (age, gender, city, etc.) are often independent variables.
One problem however is that some textbooks will say to place the independent variable in the columns (De Vaus, 2002, p. 243; Wrenn, Loudon, & Stevens, 2002, p. 213), and other textbooks will say to place it in the rows (Acock, 2008, p. 110; Huizingh, 2007, p. 246). In my experience there are more textbooks using the independent in columns convention, but I haven’t done any research on this.
Another consideration can be avoiding to waste space on a paper. If you have 10 values for one variable and only two for another variable, it might save some space to place the variable with 10 values in the columns, provided it will fit on the page. If one variable has so many values that it doesn’t fit on the page if you place them in the columns, then placing it in the rows might solve the problem.
Google adds