Life Expectancy



Next: Continuous Distributions Up: 141 Previous: Expected values


Demographers work with the distribution of age at death in the form of a "life table", which displays the information in tabular form. Starting with 100,000 live births at year zero, the life table shows the number who are expected to survive to the end of the next year. If you divide that number by 100,000, you have the probability of surviving to the given age. This is also known as the "survivor function", which is just the complement of the cumulative distribution function for age at death. In other words, if F(x) is the distribution function for age at death (Pr(age of death <= x)), then the survivor function is S(x) = 1-F(x). Similarly, F(x) = 1- S(x). To see the 1997 life tables for US males and females, issue the R command:

Life97 <- read.table("http://people.reed.edu/~jones/141/Life97.dat",
                      header=TRUE)
The read.table() function can be used to read a table of data (for example created by a spreadsheet, and saved as plain text, tab, space, or comma delimited) from your local machine or from a URL as in this example. If comma delimited, use read.table("file",header=TRUE,sep=",").
Now enter the name of the data.frame to see the life table:
           Life97
You can see that in the older age groups, the ratio of females to males becomes increasingly large. It is worth noting that life tables like these are based on historical data and model-based projections. Your mileage may vary!

Computing Life Expectancy from the life table

Life97$Male is the data for males, and Life97$Female is the data for females. Alternative equivalent expressions are Life97[,"Male"] and Life97[,"Female"], or Life97[,2] and Life97[,3]. Many roads lead to Rome.

These tables represent the number surviving to a given age. If we divide by 100000, we will have the proportion surviving to a given age. We will work with the distribution for age at death. "Death at or before n" is the complement of the event "survival through age n".

To compute the distribution function for age at death for females use the following R command:

           Fem  = 1 - Life97$Female/100000
           Male = 1 - Life97$Male/100000
Here are commands to plot of the cumulative distribution functions: F(n) = Pr(age at death less than or equal to n). Note the use of 'par(mfrow=c(2,1))' to create a window with two rows and one column of plots. Try 'par(mfrow=c(1,2))'.
 
           par(mfrow=c(2,1))
           barplot(Fem,names.arg=Life97$age,space=0)
           title("Females")
           barplot(Male,names.arg=Life97$age,space=0)
           title("Males")
The probability of death at age n is the difference between F(n) and F(n-1), that is the probability of death at or before age n, minus the probability of death at or before age n-1. In R we can use the following commands to compute the probabilities of death in a given year:
           Pf <- diff(Fem)
           Pm <- diff(Male)
The diff() function computes the differences between successive values in the dataset. The result is the probability of death in year N, for N running from 1 to 119. Here are commands to plot of the cumulative distribution along with the probability density function (histogram) just described for females:
           barplot(Fem,names.arg=Life97$age,space=0)
           barplot(Pf,names.arg=Life97$age[-1],space=0,col="green")
Make sure you understand the relation between these pairs of graphs.

Compare the shapes of the density functions (histograms) for male and females. What differences can you see? Can you explain the differences?

We are now ready to compute the expected age at death, known more positively as the "life expectancy". The age categories are 1 to 119 years, in 1 year intervals.

           age = 1:119
           sum( age*Pf )
In fact, people don't die on their birthday; deaths are distributed throughout the year. If the deaths are roughly evenly distributed within years, then the average age at death in year n would be n-1/2. How much difference does that make?
           age <- (1:119)-1/2
           sum( age*Pf )

Conditional Life Expectancy

What is the conditional probability of death at age k, given you have survived to age n? Can you see how to compute the conditional probabilites for k=n+1,n+2, n+3,...? Hint: you need to use the definition of conditional probability! You are conditioning on an event of the form "age > n".
 
    Pr(age > n) = Pr(age = n+1)+ Pr(age=n+2) + ...  
For example, the probability of survival past age 50 is
sum(Pf[51:119])
It is a bother to have to figure out which categories to sum, especially when the age categories don't exactly match the row numbers of the table. A neat solution is to let R do the selection for you:
sum(Pf[age > 50])
To see what R is doing, just print the data,
data.frame(age,Pf)
and then compare to the selected subset:
data.frame(age,Pf)[age>50,]

The conditional probability of death at age n+1, given death at age greater than n, is

 
    Pr(age = n+1 | age > n) = Pr(age = n+1 and age > n)/Pr(age > n)

                            = Pr(age = n+1)/Pr(age > n)
The intersection of the events "age > n" and "age = n+1" is just the latter event. Thus we can compute the conditional probabilities of death at each age greater than n simply by dividing by their sum.

To get the conditional probabilities of death at each age given survival to age 50 for US women, use:

 
           Pf50 <- Pf[age>50]/sum(Pf[age>50])
In other words, Pf50 contains the probability of death at each age past 50, given survival through age 50. This conditional distribution is a perfectly good probability distribution. The probabilities sum to 1:
 
           sum(Pf50) 
We can compute the expected age at death, given survival to age 50, using this conditional distribution. This is known as the conditional life expectancy. The sample space is the set of possible outcomes (51:119), so using the definition of expectation, the conditional life expectancy for a US female surviving to age 50 is
           sum(age[age>50]*Pf50)
It is greater than the unconditional expected value - the unconditional life expectancy includes those who die at a younger age. It is not wildly different, though. To see why, look at the probability of survival past age 50:
 
           sum(Pf[age>50])

Exercises

For the following exercises, turn in the sequence of R commands you used to compute each result as well as the result.

  1. Compute the life expectancy based on these tables for both males and females. Compare the mean length of life to the median length of life. Hint: you can figure out the median by examination of the original life table!
  2. What proportion of each sex survives past age 20? 50? 80? Compute the conditional life expectancy for both males and females, given survival to each of those ages..
  3. The life expectation varies from group to group. For a dramatically different population, load the life tables for Afghanistan with
    AfghanLife <- read.table("http://people.reed.edu/~jones/141/Afghan.dat",
                          header=TRUE)
    
    This table is much less detailed, with age of death in 5 year increments. To estimate life expectancy you will need to impute an age of death for each interval. The label for each interval is its upper endpoint, so the interval labeled "1" corresponds to deaths in the interval from 0 to 1, the interval labeled "5" to deaths in the interval from 1 to 5 years, etc.

    Compare the overall life expectancy and conditional life expectancy at ages 20, 50, and 80 with the corresponding US values.

    Note: The label for each interval is its upper endpoint, so the interval labeled "1" corresponds to deaths in the interval from 0 to 1, the interval labeled "5" to deaths in the interval from 1 to 5 years, etc. You will need to create an age dataset corresponding to the midpoints of these intervals.

  4. A few years ago a New York Times article reported that 37 well known male symphony directors had an average life span about 5 years greater than the average for US males. Speculation revolved around life-style differences (symphony directors wave their arms about while directing, that must be good exercise!) and possible genetic explanations. Is there a simple statistical explanation connected to conditional expectation? Explain.


Next: Continuous Distributions Up: 141 Previous: Expected values


Albyn Jones
August 2004