Superimposed Density Plots

An idea similar to `back-to-back' histograms or stem and leaf plots is to superimpose to histograms on each other. Unfortunately R does not make it easy to do this directly, as the histogram function is a high-level plotting function that wants to start a new plot each time.

Let's start by setting up some variables to play with, and opening the graphics window.

Glucose1 is the glucose tolerance test data for 53 non-pregnant women, and Glucose2 is the corresponding data for 52 women who were pregnant 3 times during the study period. It will be easier to work with the data if we pick out a couple of columns and rename them.

	A1 <- Glucose1[,1]
	A2 <- Glucose1[,2]
	B1 <- Glucose2[,1]

	par(mfrow=c(2,2))       # make 4 plots in the window

If you wish to reset the plot window to a single plot, use

        par(mfrow=c(1,1))

Now, let's plot a histogram of the dataset A1, and overlay a normal distribution (the so-called bell shaped curve) of the same mean and standard deviation. Here is the "add.norm" function. It adds a normal density curve (the bell shaped curve), for a given mean and standard deviation, to an existing plot.

"add.norm"<-
function(m, s, r, lty=3,lwd=2, col="blue")
{
#
# adds normal density to a histogram or other plot
# m is mean, s is sd, r is range (optional)
#
        if(missing(r) || length(r) != 2) r <- c(m - 3 * s, m + 3 * s)
        x <- seq(r[1], r[2], length = 200)
        y <- dnorm(x, m, s)
        lines(x, y, lty=lty,lwd=lwd, col=col)
}

Let's try it with some of our data:

	hist(A1,freq=FALSE,col="yellow")     
	m <- mean(A1)
	s <- sd(A1)
	add.norm(m,s)

Pretty ugly, if you ask me. Histograms are pretty crude indicators of normality. We can tell that the distribution is most likely unimodal, not horribly asymetric, and without wild outliers. We can't really see the shape of the distribution clearly. Lets try a density trace:

	plot(density(A1))     
	add.norm(m,s)

The plot axes may not be quite right. You can adjust the vertical limits using the "ylim" argument to plot:

	plot(density(A1),ylim=c(0, .025),col="red")     
	add.norm(m,s)

The add.norm() function takes as its arguments the mean and standard deviation of the normal density curve we wish to plot. In this case, we wanted a normal curve with the same mean and standard deviation as the dataset represented by the histogram or density plot.

Finally, let's plot the density traces for the two samples of glucose tolerance test data A1 and A2 in the same picture:

	plot(density(A1),col="blue",lwd=2)
	lines(density(A2),lty=2,lwd=2,col="brown")

We can fix the boundaries of the plot by specifying xlim and ylim:

	plot(density(A1))
	lines(density(A2),lty=4)
        lines(density(B1),lty=3)

Look at the range (min,max) of the datasests:

        range(A1)
        range(A2)
        range(B1)

The smallest value is at -37 (from A2) and the largest at 104 (from B1). We should leave a little extra room at the ends:

	plot(density(A1),xlim=c(-50,120),ylim=c(0,.025),
               lty=1,lwd=2,col="blue")
	lines(density(A2),lty=4,lwd=2,col="turquoise")
        lines(density(B1),lty=3,lwd=2,col="red")
        abline(h=0)
        legend(50,.02,
           legend=c("NonPregnant yr1", "NonPregnant yr2","Pregnant yr1"),
           lty=c(1,4,3),lwd=c(2,2,2),
           col=c("blue","turquoise","red"))

Math 141 Index

Albyn Jones