The table below is a three-way table that presents admissions data at the University of California, Berkeley in 1973 according to the variables department (A, B, C, D, E), gender (male, female), and outcome (admitted, denied) encoded as Yes and No.
Department | Male Yes | Male No | Female Yes | Female No |
---|---|---|---|---|
A | 512 | 313 | 89 | 19 |
B | 313 | 207 | 17 | 8 |
C | 120 | 205 | 202 | 391 |
D | 138 | 279 | 131 | 244 |
E | 53 | 138 | 94 | 299 |
F | 22 | 351 | 24 | 317 |
All | 1158 | 1493 | 557 | 1278 |
An analysis of just the variables gender and admissions shows a correlation that suggests gender bias: the proportion of women admitted was significantly lower than the proportion of men admitted. However, when the department variable is taken into account, the gender bias disappears. Generally, the women were applying for admission in the harder departments, those with low admission rates.
A data set in which a correlation between two variables disapears, or even reverses, when a third variable is taken into account is known as Simpson's paradox.