p = exp(a + b*X)/(1+exp(a+b*X))
We could fit this model by least squares, by transforming the responses:
P = Y/n log( P/ (1-P) ) = a + b*Xand fitting by least squares. Unfortunately, the transformed response still has non-constant variance.
The method of estimation we will use is called "Maximum Likelihood Estimation", MLE for short. MLE amounts to choosing the value of the parameters that maximizes probability of the observed data. In practice, this is accomplished by an algorithm similar to that used in nonlinear least squares: start with an initial guess, and then alter the parameter values to increase the objective function, ie. the probability of the observed data. In the case of a wide class of distributions, which includes the normal, binomial, poisson, and gamma distributions, the initial guesses can be made automatically, relieving us of the problem of choosing them.
topic | The Linear Model | Logistic Regression |
---|---|---|
call | lm(Y ~ X, data=DataFrame) |
glm(cbind(Y,N-Y) ~ X, family = binomial,data=DataFrame) |
Summary Table | summary() | summary() |
Residuals | residuals() | residuals() |
Fitted Values | fitted() | fitted() |
Model Comparisons | anova(): F-test | anova(): ChiSquared-test |
Case Diagnostics | cooks.distance(), hatvalues(), dfbetas(), etc. | cooks.distance(), hatvalues(), dfbetas(), etc. |
This example makes use of data from the Framingham Longitudinal Study:
"Framingham" <- structure(.Data = list( "present" = c(2, 3, 3, 4, 3, 2, 0, 3, 8, 11, 6, 6, 7, 12, 11, 11), "absent" = c(117, 121, 47, 22, 85, 98, 43, 20, 119, 209, 68, 43, 67, 99, 46, 33), "SC" = structure( .Data = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4), levels = c("<200", "200-219", "220-259", "260+"), class = "factor"), "BP" = structure( .Data = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), levels = c("<127", "127-146", "147-166", "167+"), class = "factor")), names = c("present", "absent", "SC", "BP"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16"), class = "data.frame")You can simply paste the data above into your R session, then type "Framingham" (the name of the data.frame) to display the data. The variables are
There are several models that we might be interested in fitting here.
m2 <- glm(cbind(present,absent) ~ SC+BP,Framingham,family=binomial)
The anova function produces the information needed to test null hypotheses that arise by restricting the parameters of a model. For example, m1a is a restriction of m2: all the parameters for BP have been set to zero. The following schematic diagram represents the nesting relationships between the models; comparable models are linked by vertical or diagonal lines:
m0 / \ / \ m1a m1b \ / \ / m2 | | m3Models "m1a" and "m1b" are not comparable, since neither is a restriction of the other. You should look at the summary output for each model.
Here is a table giving for each model, the residual df and the residual deviance.
Analysis of Deviance Table model df deviance m0 15 58.76 m1a 12 26.80 m1b 12 35.16 m2 9 8.07 m3 0 0.00
> pchisq(8.076,9,lower.tail=FALSE) [1] 0.5265058You can ask anova() to compute the p-value:
> anova(m2,m3,test="Chisq") Analysis of Deviance Table Model 1: cbind(present, absent) ~ SC + BP Model 2: cbind(present, absent) ~ SC * BP Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 9 8.0762 2 0 0.0000 9 8.0762 0.5265The Chisquared test is really a Likelihood Ratio test, so you can do the same analysis with:
anova(m2,m3,test="LRT")It is useful to know that the expected value of a ChiSquared random variable is equal to it's degrees of freedom.
Notice that model "m3" is a saturated model: there is one parameter for each case, and thus no residual df, and no residual deviance. In effect, model "m3" has fit each case with its sample proportion. Comparing model "m2" to model "m3" may be done by comparing the deviance to the 95th percentile of the chi-squared distribution with 9df. Since the expected value of a Chi-squared random variable is its degrees of freedom, the deviance is clearly not large. We can conclude that model "m2" fits the data well, or to put it in more formal language, we can not reject the null hypothesis that there is no interaction between SC and BP at the 5% level.
Compare model "m1a" to model "m2"; you should conclude that model "m1a" does not fit well, compared to "m2". Formally speaking, we can reject the null hypothesis that BP has no effect on CHD.
Albyn Jones