Inference

Inference for Generalized Linear Models is almost completely analogous to inference for linear models fit by least squares, and of course very similar to inference for normal theory models fit by non-linear least squares. This should not be too surprising, since the normal theory linear model is a special case of the GLM.

Confidence Intervals and tests for a single parameter

Tests or confidence intervals for single parameters for GLM's resemble the procedures for single coefficients in the standard linear model. The important difference is that here they are based on the asymptotic (ie. large sample) normality of parameter estimates that are MLEs. The standard errors are based on the second derivative of the log-likelihood function at the maximum, ie. an indication of how rapidly the function decreases as one moves away from the peak. This doesn't depend directly on the residual degrees of freedom as it does in the linear model, but rather on the validity of the asymptotic approximation. Thus there is no natural degrees of freedom to use to define the t-statistics; they are really approximate "Z scores". Thus, a 95% confidence interval for a parameter uses the critical value from the normal distribution (1.96) rather than from a t distribution.

Example: Here is the summary table for the skin cancer dataset.

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -11.65945    0.44872 -25.984  < 2e-16 ***
agegrp25-34   2.63035    0.46747   5.627 1.84e-08 ***
agegrp35-44   3.84803    0.45467   8.463  < 2e-16 ***
agegrp45-54   4.59674    0.45104  10.191  < 2e-16 ***
agegrp55-64   5.08987    0.45031  11.303  < 2e-16 ***
agegrp65-74   5.64997    0.44976  12.562  < 2e-16 ***
agegrp75-84   6.06538    0.45035  13.468  < 2e-16 ***
agegrp85+     6.18188    0.45782  13.503  < 2e-16 ***
Ft.Worth      0.80654    0.05228  15.426  < 2e-16 ***

The (approximate!) 95% confidence interval (CI) for the "Ft.Worth" parameter, which measures the difference between St. Paul and Ft. Worth in the log odds scale, is

	(.8065 - 1.96*.0522, .8065 + 1.96*.0522)

Corresponding to the 95% CI is the 5% significance level hypothesis test for the null hypothesis that the coefficient is zero; the "t value" is really (after rounding)

	(.8065 - 0)/.0522 = 15.43

and should be compared to 1.96 in magnitude. This one is clearly larger than 1.96, so we would reject the null hypothesis at the 5% significance level. The p-value should be computed with the "pnorm" function.

	2*(1-pnorm(15.43))

Recall that pnorm computes the area under the normal curve to the left of the argument (15.43 here), while the p-value is twice the area to the right, or in general, beyond that value. It's essentially 0.

Comparing Models

Models fit by maximum likelihood can be compared using the "Likelihood Ratio Test" (LRT). The calculations are implemented in R through the "anova" function, analogous to its use with the lm function.

As with linear models with normally distributed errors, we can compare nested models. In other words, one model must be a restriction of the other, usually due to setting one or more parameters equal to zero. Suppose that logLf is the logarithm of the likelihood function for the full model, and logLr is the logarithm of the likelihood function for the restricted model. The basic result is that if twice the difference of the log-likelihood functions has approximately a Chi-squared distribution, with degrees of freedom equal to the difference in the number of free parameters.

In R, the summary function reports a quantity called the deviance for the model; actually its labeled "residual deviance". The residual deviance is just 2 times the log-likelihood for the model. Thus we can compare models simply by subtracting their residual deviances. That is essentially all that the anova() function does in the case of glm output for the binomial family: subtract the deviances, and report the differnce, and the difference in the degrees of freedom.

Example: Let's test the assumption that the log odds are linear in age for the skin cancer data, by comparing the first model (using age) to the second (using agegrp):

anova(sc.glm,sc1.glm)

should produce

Analysis of Deviance Table

Model 1: cbind(cases, n - cases) ~ age + Ft.Worth
Model 2: cbind(cases, n - cases) ~ agegrp + Ft.Worth
  Resid. Df Resid. Dev Df Deviance
1        13    158.824            
2         7      8.146  6   150.68

To evaluate the result, just compute the rejection point for a 5% significance level:

qchisq(.95,6)

The reduction in deviance (150.68, 6 df) is much larger than 12.59, so we reject the null hypothesis that the log odds are linear in age.

The residual deviance for the "age" model is about 159, with 13 degrees of freedom. You should be able to compute the df yourself if necessary: 16 cells minus 3 parameters yields 13 df.

We can ask R to compute the p-values for us in the anova function:

> anova(sc.glm,sc1.glm,test="Chisq")
Analysis of Deviance Table

Model 1: cbind(cases, n - cases) ~ age + Ft.Worth
Model 2: cbind(cases, n - cases) ~ agegrp + Ft.Worth
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1        13    158.824                          
2         7      8.146  6   150.68 < 2.2e-16 ***

Finally, we can use the anova function to construct the analog of an anova table for a single model:

anova(sc1.glm)

yields

Analysis of Deviance Table

Binomial model

Response: cbind(cases, n - cases)

Terms added sequentially (first to last)
         Df Deviance Resid. Df Resid. Dev 
    NULL                    15   2794.117
  agegrp  7 2526.629         8    267.488
Ft.Worth  1  259.342         7      8.146

The second column, labeled "Deviance" is the reduction in the residual deviance achieved by adding the corresponding term to the model. For example, the factor "agegrp" has 7 df, and results in a reduction of 2526.6; while adding the factor "Ft.Worth" further reduces the deviance by 259, at the cost of 1 df. Finally, at the end, there are 7 df left, with a residual deviance of 8.1; we can conclude that the model fits well, because the cutoff for 7 df is about 14:

qchisq(.95,7)

In conclusion, it looks like living in Ft. Worth statistically significantly increases one's risk of non-melanoma skin cancer, as compared to living in St. Paul. The positive coefficient (.80) predicts an increase in the log odds of skin cancer of .8 in each age group, corresponding to an odds ratio of 2.22 (exp(.8)). In other words, the odds of skin cancer for women in Ft. Worth is 2.22 times higher than the odds of skin cancer for women of the same age in St. Paul.

Next: Example

Up: Math 141 Index

Back: the glm function

Albyn Jones
jones@reed.edu