Residual Plots |
We can look at the fitted line in several useful ways. Perhaps the most natural one is simply to plot the fitted line over the scatterplot of the two variables. Recall that the abline() function adds a line to an existing plot; we just have to specify the intercept and slope. We could simply type in the two numbers, eg. ``abline(51.6,-0.0073)'', or we can use the coef() function, which extracts the coefficients of the fitted line from the lm data structure.
plot(Wgt,Hmpg) abline(coef(lm.mpg))Note: if you get an error message stating "Object "Wgt" not found", you need to attach the data.frame. How does it look? We could also add lines above and below the fitted line to indicate the range we expect will cover most of the data, eg. say +/- 2 standard deviations above and below the line. Can you see how to do this?
Much more useful for diagnostic purposes is the plot of residuals against fitted values:
plot(fitted.values(lm.mpg),residuals(lm.mpg)) abline(h=0) abline(h=c(-2,2)*3.139,lty=3)This plot will often reveal several kinds of model failures such non-linearity of the relationship, outliers, and non-constant variance. The fitted.values() and the residuals() functions extract their namesakes from the lm data structure. We could directly compute the fitted values just by using the fitted coefficients in the formula for the line:
fit <- 51.60137 -0.007327059*Wgt res <- Hmpg - fitYou should compare these objects to the results of the extraction functions. In the long run, it will be easier to use the extraction functions, especially when we start working with more complicated models.
We should also make a normal probability plot of the residuals:
qqnorm(residuals(lm.mpg)) qqline(residuals(lm.mpg))Finally, the plot function has a default set of plots appropriate for different classes of objects. Try the following:
par(mfrow=c(2,2)) plot(lm.mpg)Four plots are produced. We have already discussed the first two. The "scale-location" plot is designed to reveal trends in the magnitudes of residuals. The "Cook's Distance" plot displays a measure of the effect of each case on the estimated coefficients: a case with a large Cook's distance has a relatively large impact on the values of the coefficients.