Testing independence in a 2x2 table |
Consider two events A and B. We know that A and B are independent events if P(A|B)=P(A), or equivalently if P(A and B)=P(A)*P(B). The definition of course assumes that we know the probabilities. In practice, we don't know them. We have counts of the occurrences of each of the possible combinations of outcomes: A and B, A and not-B, not-A and B, and not-A anfd not-B. These counts form a 2x2 table, where the rows are defined by A and not-A, the columns by B and not-B. For example, data from the Physicians Health Study (1988 NEJM 318: 262-264). Here we could define the events as A: took aspirin, not-A: took a placebo, B: had a heart attack (Myocardial Infarction), not-B: no MI.
MI | no MI | |
Placebo | 189 | 10845 |
Aspirin | 104 | 10933 |
The observed proportions within each row (estimates of P(MI|not-A) and P(MI|A) ) are:
> 189/(189+10845) [1] 0.01712887 > 104/(104+10933) [1] 0.00942285The idea of the chi-squared test for independence is essentially to ask the question: how unlikely are we to see proportions this different if the two factors are really independent? If it is sufficiently unlikely, usually taken to mean `occurring less than 5 percent of the time', then we conclude that we have evidence of dependence.
> Aspirin = matrix(c(189,104,10845,10933),ncol=2) > Aspirin [,1] [,2] [1,] 189 10845 [2,] 104 10933Now the chi-squared test:
> chisq.test(Aspirin,correct=F) Pearson's Chi-squared test data: Aspirin X-squared = 25.0139, df = 1, p-value = 5.692e-07
> aspirin = c("no","yes","no","yes") > MI = c("yes","yes","no","no") > N = c(189,104,10845,10933) > A = data.frame(aspirin,MI,N) > A aspirin MI N 1 no yes 189 2 yes yes 104 3 no no 10845 4 yes no 10933Now we create the crosstabulation and compute the chi-squared:
> Aspirin = xtabs(N~aspirin+MI,data=A)The xtabs() function creates the cross-tabulation. The two factors (aspirin and MI) define the table. N countains the counts for each cell of the table.
> Aspirin MI aspirin no yes no 10845 189 yes 10933 104 > chisq.test(Aspirin,correct=F) Pearson's Chi-squared test data: Aspirin X-squared = 25.0139, df = 1, p-value = 5.692e-07You will note that the data are displayed in a slightly different order, but the values are the same, and the test statistic is identical.