F tests and anova() |
The F test is the basic tool for model comparison. Fit a full model, and restricted model, then compare:
Mf <- lm(y ~ x1+x2+x3+x4) Mr <- lm(y ~ x1+x2) anova(Mr,Mf)Note: the models must be fit with the same observations.
Assignment |
Now fit the following linear models in the log scale:
Use the F-test (anova() function) to compare the models. Which is the best model?
There are three classes, birds, fish, and mammals. Which group is the baseline group represented by the intercept term? Which group has the highest average log(Brain.WT) after controlling for log(Body.WT)? Which species seem unusually "big brained" or especially "small brained", relative to their groups? It will be helpful to look at plots with different symbols or colors for the different groups!
FL <- read.csv("http://people.reed.edu/~jones/141/FL.dat")Read the data description on the 141 website!!
We are primarily interested in over- and under-voted ballots. Define the new variable NoVote <- over+under. Your mission is to relate the sum of over and under vote counts to explanatory factors including voting technology (Tech), ballot layout (Layout), number of ballots cast, and possible socio-economic correlates of voting efficacy: education (PctHS or PctColGrad), percent elderly population, poverty, unemployment, median household income.
lmF <- lm(log(NoVote) ~ Tech*log(Ballots) + Tech*Pct65 + Tech*PctHS)Examine the summary table: why are some coefficients not estimated? Hint: look at the names of the Tech interactions. Omit the two counties causing the trouble, and rerun the full model.
lmR <- lm(log(NoVote) ~ Tech + log(Ballots) + PctHS)Note:I haven't included the case numbers of the counties to be omitted! You need to omit the same cases in the restricted model that you did in the full model. Examine the summary table.