Testing independence in a 2x2 table

Consider two events A and B. We know that A and B are independent events if P(A|B)=P(A), or equivalently if P(A and B)=P(A)*P(B). The definition of course assumes that we know the probabilities. In practice, we don't know them. We have counts of the occurrences of each of the possible combinations of outcomes: A and B, A and not-B, not-A and B, and not-A anfd not-B. These counts form a 2x2 table, where the rows are defined by A and not-A, the columns by B and not-B. For example, data from the Physicians Health Study (1988 NEJM 318: 262-264). Here we could define the events as A: took aspirin, not-A: took a placebo, B: had a heart attack (Myocardial Infarction), not-B: no MI.

MI no MI
Placebo 189 10845
Aspirin 104 10933

The observed proportions within each row (estimates of P(MI|not-A) and P(MI|A) ) are:

> 189/(189+10845)
[1] 0.01712887

> 104/(104+10933)
[1] 0.00942285
The idea of the chi-squared test for independence is essentially to ask the question: how unlikely are we to see proportions this different if the two factors are really independent? If it is sufficiently unlikely, usually taken to mean `occurring less than 5 percent of the time', then we conclude that we have evidence of dependence.

The chi-squared test in R

We will walk through the process step by step: creation of the dataset, computing the chi-squared test statistic, and evaluation of the results.

create the dataset

There are two methods, one quick and dirty, the other creating a more self-explanatory dataset with labels.

What does it mean?

The chi-squared statistic is 25.01 with one degree of freedom (we will explain degrees of freedom later!). The p-value is desired probability; if it less than .05 we infer dependence. If it is not less than .05 we have failed to demonstrate dependence, which is not the same as demonstrating independence! Here it is roughly .0000005, which is really tiny. In other words, we have evidence that taking aspirin and heart attacks are not independent. The proportion of subjects who had an MI while taking aspirin was roughly half that of the placebo group.

Homeworksimulation!


Math 141 Index
Introduction to S

Albyn Jones
September 2005