Histograms |
Let's take a look at the distributions for non-pregnant women first. Try
hist(Glucose1[,1],col="yellow",main="First Glucose Test") # or, if you assigned A1 <- Glucose1[,1] hist(A1,col="yellow",main="First Glucose Test")Notice that the labels on the vertical axis are counts. To look at the "density" (ie. relative frequency) histogram, use
hist(Glucose1[,1],freq=FALSE,col="turquoise",main="First Glucose Test")Sometimes you will want to look at raw counts, for example when you are familiarizing yourself with a dataset, checking a dataset you have entered for typos, or you want to know exactly how many cases fell in a certain range.
When studying the density, the counts are not as important as the relative frequencies. When you use the `freq=F' argument in hist(), you are asking for the relative frequency histogram, which has total area 1. Area is proportional to relative frequency. For example, in the interval from 0 to 10, the frequency histogram shows a count of 8. The height of that interval in the relative frequency histogram is about .015, and the width is 10. Thus the area for that interval is about .15 (15% of the sample), and .15*53 is about 8.
It is worth looking at the same data with different break points for the intervals. The seq() function produces a regularly spaced sequence with a given starting value, ending value, and interval size (specified in that order, eg. seq(1,10,1) is the same as 1:10).
hist(Glucose1[,1],breaks=seq(-35,55,10),freq=FALSE,col=rgb(0,0,1))
hist(Glucose1[,1],breaks=seq(-30,60,15),freq=FALSE,
col=sample(colors(),1))
The `breaks' argument tells S to construct the histogram with
the interval boundaries specified in `breaks'. The first example above
specifies boundaries -35, -25, -15, -5, ..., 35, 45, 55.
In each case, the histogram looks slightly different, but the basic shape seems to be pretty stable.
Finally, try the following commands (suggestion: pick your own colors!):
par(mfrow=c(2,2)) # 4 plots in the plot window
hist(Glucose1[,1],breaks=seq(-35,65,10),
main="Counts, Equal Intervals")
hist(Glucose1[,1],breaks=seq(-35,65,10),freq=FALSE,
main="Relative Frequency, Equal Intervals")
hist(Glucose1[,1],breaks=c(-35,-15,-5,5,15,25,35,65),
main="Unequal Intervals")
hist(Glucose1[,1],breaks=seq(-35,65,2),freq=FALSE,
main="Too many Intervals")
R tries to protect you from making silly histograms by forcing the
histogram to have the relative frequency scaling whenever the intervals are
not all equal in length. The total area
of a histogram should be 1 in the probability scale, or proportional
to the sample size in the count scale. With equal width intervals
there is no difficulty in achieving these goals. When intervals have
unequal widths, you have to remember to scale the height
(counts or proportions) so that the area of the segments are
proportional. Many other statistical software packages
violate the prescription that area should be proportional to
frequency because it is covenient to be able to read the
raw counts corresponding to the
various intervals directly from the axis labels, but this practice leads
to histograms that don't satisfy the condition that area is
proportional to probability.
When you use the mfrow option to make multiple plots in the graphics window, the plots will be made in a fixed order (left to right across each row, top to bottom). You can't go back and remake a previous plot without remaking all of the graphs in the panel. To reset the graphics window so it will display a single plot, use:
par(mfrow=c(1,1))
There are various arguments to the hist() function that we can use to
alter the figure, such as adding titles, or changing the labels on the
axes. Here is an example.
hist(Glucose1[,1],xlab="mg/100ml",freq=FALSE,col="green",
main="Glucose Tolerance Test Results")
?dev.copy
# typical usage:
hist(Glucose1[,1],breaks=seq(-35,55,10),freq=FALSE,col=rgb(0,0,1))
dev.copy(pdf,file="Glucose.pdf")
dev.off()
Don't forget to close the file with dev.off()!