Histograms

Let's take a look at the distributions for non-pregnant women first. Try

	
hist(Glucose1[,1],col="yellow",main="First Glucose Test")
#    or, if you assigned A1 <- Glucose1[,1]
hist(A1,col="yellow",main="First Glucose Test")

Notice that the labels on the vertical axis are counts. To look at the "density" (ie. relative frequency) histogram, use

	
hist(Glucose1[,1],freq=FALSE,col="turquoise",main="First Glucose Test")

Sometimes you will want to look at raw counts, for example when you are familiarizing yourself with a dataset, checking a dataset you have entered for typos, or you want to know exactly how many cases fell in a certain range.

When studying the density, the counts are not as important as the relative frequencies. When you use the `freq=F' argument in hist(), you are asking for the relative frequency histogram, which has total area 1. Area is proportional to relative frequency. For example, in the interval from 0 to 10, the frequency histogram shows a count of 8. The height of that interval in the relative frequency histogram is about .015, and the width is 10. Thus the area for that interval is about .15 (15% of the sample), and .15*53 is about 8.

It is worth looking at the same data with different break points for the intervals. The seq() function produces a regularly spaced sequence with a given starting value, ending value, and interval size (specified in that order, eg. seq(1,10,1) is the same as 1:10).

	
hist(Glucose1[,1],breaks=seq(-35,55,10),freq=FALSE,col=rgb(0,0,1))

hist(Glucose1[,1],breaks=seq(-30,60,15),freq=FALSE,
                 col=sample(colors(),1))

The `breaks' argument tells S to construct the histogram with the interval boundaries specified in `breaks'. The first example above specifies boundaries -35, -25, -15, -5, ..., 35, 45, 55.

In each case, the histogram looks slightly different, but the basic shape seems to be pretty stable.

Finally, try the following commands (suggestion: pick your own colors!):

	
par(mfrow=c(2,2))  # 4 plots in the plot window

hist(Glucose1[,1],breaks=seq(-35,65,10), 
     main="Counts, Equal Intervals")

hist(Glucose1[,1],breaks=seq(-35,65,10),freq=FALSE,
     main="Relative Frequency, Equal Intervals")

hist(Glucose1[,1],breaks=c(-35,-15,-5,5,15,25,35,65),
     main="Unequal Intervals")

hist(Glucose1[,1],breaks=seq(-35,65,2),freq=FALSE,
     main="Too many Intervals")

R tries to protect you from making silly histograms by forcing the histogram to have the relative frequency scaling whenever the intervals are not all equal in length. The total area of a histogram should be 1 in the probability scale, or proportional to the sample size in the count scale. With equal width intervals there is no difficulty in achieving these goals. When intervals have unequal widths, you have to remember to scale the height (counts or proportions) so that the area of the segments are proportional. Many other statistical software packages violate the prescription that area should be proportional to frequency because it is covenient to be able to read the raw counts corresponding to the various intervals directly from the axis labels, but this practice leads to histograms that don't satisfy the condition that area is proportional to probability.

When you use the mfrow option to make multiple plots in the graphics window, the plots will be made in a fixed order (left to right across each row, top to bottom). You can't go back and remake a previous plot without remaking all of the graphs in the panel. To reset the graphics window so it will display a single plot, use:

     par(mfrow=c(1,1))

There are various arguments to the hist() function that we can use to alter the figure, such as adding titles, or changing the labels on the axes. Here is an example.

	
hist(Glucose1[,1],xlab="mg/100ml",freq=FALSE,col="green",
     main="Glucose Tolerance Test Results")

Printing a histogram

In R on a Mac, the easiest way to produce graphical hardcopy is to paste your graphs into your favorite word processor. In Unix there are other methods for handling graphics. It is also possible to export the plot as an encapsulated postscript (eps) file for inclusion in a document, or as a pdf. If you think these might be useful to you, take a look at the help file:

     ?dev.copy
# typical usage:
hist(Glucose1[,1],breaks=seq(-35,55,10),freq=FALSE,col=rgb(0,0,1))
dev.copy(pdf,file="Glucose.pdf")
dev.off()

Don't forget to close the file with dev.off()!

Math 141 Index

Albyn Jones