Testing independence in a 2x2 table |
Consider two events A and B. We know that A and B are independent events if P(A|B)=P(A), or equivalently if P(A and B)=P(A)*P(B). The definition of course assumes that we know the probabilities. In practice, we don't know them. We have counts of the occurrences of each of the possible combinations of outcomes: A and B, A and not-B, not-A and B, and not-A anfd not-B. These counts form a 2x2 table, where the rows are defined by A and not-A, the columns by B and not-B. For example, data from the Physicians Health Study (1988 NEJM 318: 262-264). Here we could define the events as A: took aspirin, not-A: took a placebo, B: had a heart attack (Myocardial Infarction), not-B: no MI.
| MI | no MI | |
| Placebo | 189 | 10845 |
| Aspirin | 104 | 10933 |
The observed proportions within each row (estimates of P(MI|not-A) and P(MI|A) ) are:
> 189/(189+10845) [1] 0.01712887 > 104/(104+10933) [1] 0.00942285The idea of the chi-squared test for independence is essentially to ask the question: how unlikely are we to see proportions this different if the two factors are really independent? If it is sufficiently unlikely, usually taken to mean `occurring less than 5 percent of the time', then we conclude that we have evidence of dependence.
> Aspirin = matrix(c(189,104,10845,10933),ncol=2)
> Aspirin
[,1] [,2]
[1,] 189 10845
[2,] 104 10933
Now the chi-squared test:
> chisq.test(Aspirin,correct=F)
Pearson's Chi-squared test
data: Aspirin
X-squared = 25.0139, df = 1, p-value = 5.692e-07
> aspirin = c("no","yes","no","yes")
> MI = c("yes","yes","no","no")
> N = c(189,104,10845,10933)
> A = data.frame(aspirin,MI,N)
> A
aspirin MI N
1 no yes 189
2 yes yes 104
3 no no 10845
4 yes no 10933
Now we create the crosstabulation and compute the chi-squared:
> Aspirin = xtabs(N~aspirin+MI,data=A)The xtabs() function creates the cross-tabulation. The two factors (aspirin and MI) define the table. N countains the counts for each cell of the table.
> Aspirin
MI
aspirin no yes
no 10845 189
yes 10933 104
> chisq.test(Aspirin,correct=F)
Pearson's Chi-squared test
data: Aspirin
X-squared = 25.0139, df = 1, p-value = 5.692e-07
You will note that the data are displayed in a slightly different
order, but the values are the same, and the test statistic is identical.