Example: Heart Disease

This example makes use of data from the Framingham Longitudinal Study:

"Framingham" <- structure(.Data = list(
"present" = c(2, 3, 3, 4, 3, 2, 0, 3, 8, 11, 6, 6, 7, 12, 11, 11), 
"absent" = c(117, 121, 47, 22, 85, 98, 43, 20, 119, 209, 68, 43, 
             67, 99, 46, 33), 
"SC" = structure(
   .Data = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4), 
    levels = c("<200", "200-219", "220-259", "260+"), class = "factor"), 
"BP" = structure(
    .Data = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), 
    levels = c("<127", "127-146", "147-166", "167+"), class = "factor")), 
names = c("present", "absent", "SC", "BP"), 
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
         "9", "10", "11", "12", "13", "14", "15", "16"), 
class = "data.frame")

You can simply paste the data above into your R session, then type "Framingham" (the name of the data.frame) to display the data. The variables are

present number of cases with Coronary Heart Disease (CHD)
absent number of cases without CHD
SC serum cholesterol level
BP blood pressure level

There are several models that we might be interested in fitting here.

m0: Y ~ 1
m1a: Y ~ SC
m1b: Y ~ BP
m2: Y ~ SC+BP
m3: Y ~ SC*BP

In each case, the response "Y" is the matrix "cbind(present,absent)". For example, to fit the model "m2" use:

m2 <- glm(cbind(present,absent) ~ SC+BP,Framingham,family=binomial)

The anova function produces the information needed to test null hypotheses that arise by restricting the parameters of a model. For example, m1a is a restriction of m2: all the parameters for BP have been set to zero. The following schematic diagram represents the nesting relationships between the models; comparable models are linked by vertical or diagonal lines:

Models "m1a" and "m1b" are not comparable, since neither is a restriction of the other. You should look at the summary output for each model.

Here is a table giving for each model, the residual df and the residual deviance.

	Analysis of Deviance Table

	model	df	deviance
	m0      15      58.76
	m1a     12	26.80
	m1b	12	35.16
	m2	 9	 8.07
	m3	 0       0.00

Notice that model "m3" is a saturated model: there is one parameter for each case, and thus no residual df, and no residual deviance. In effect, model "m3" has fit each case with its sample proportion. Comparing model "m2" to model "m3" may be done by comparing the deviance to the 95th percentile of the chi-squared distribution with 9df. Since the expected value of a Chi-squared random variable is its degrees of freedom, the deviance is clearly not large. We can conclude that model "m2" fits the data well, or to put it in more formal language, we can not reject the null hypothesis that there is no interaction between SC and BP at the 5% level.

Compare model "m1a" to model "m2"; you should conclude that model "m1a" does not fit well, compared to "m2". Formally speaking, we can reject the null hypothesis that BP has no effect on CHD.

Up: Math 141 Index

Inference

Albyn Jones
jones@reed.edu