Stem and Leaf plots


Stem and leaf plots are just histograms that display a few significant digits of the raw data. We can alter the interval size and other features of the plot if we don't like the default display.

 
stem(Glucose1[,1])       # first column of data

Decimal point is 1 place to the right of the colon

  -2 : 32
  -1 : 98754110
  -0 : 85321
   0 : 00012489999
   1 : 1122445666888
   2 : 0445568
   3 : 000129
   4 :
   5 : 0
Compare to a similar histogram:
 hist(Glucose1[,1],breaks=seq(-31,59,10),col="yellow",
     main="First Glucose Test")
There are minor differences, mainly due to different handling of the interval boundaries.

To understand how the stem and leaf display is constructed, print the data in sorted order:

      sort(Glucose1[,1])
You should see the values sorted into ascending order:
 -23 -22 -19 -18 -17 -15 -14 -11 -11 -10  -8  -5  -3  -2  -1   
   0   0   0   1   2   4   8   9   9   9   9  11  11  12  12  
  14  14  15  16  16  16  18  18  18  20  24  24  25  25  26  
  28  30  30  30  31  32  39  50
The values -23 and -22 appear in the stem display in the top row:
      -2 : 32
The leading digit is the "stem" value (-2), the next significant digits are the "leaves" (3 and 2, respectively). The stem and leaf display builds a histogram that encodes the first two digits of each observation. You can reconstruct your data from the stem and leaf display. For example, the row:
        3 : 000129 
corresponds to the values 30, 30, 30, 31, 32, and 39.
> stem(Glucose1[,2])	# second column of data

Decimal point is 1 place to the right of the colon

  -3 : 760
  -2 : 62
  -1 : 53320
  -0 : 8665541
   0 : 000222367779
   1 : 0013566789
   2 : 002345
   3 : 03466
   4 : 47
   5 : 3
These two distributions look pretty similar in shape and spread. We might expect them to, since they are just repeated tests on the same subjects. Try looking at the third set of data, that is:
	stem(Glucose1[,3])
This one looks a bit different. One difference is that the "stem" values are different in this case.

Stem chooses a scale by a reasonable "rule of thumb", but it is not always the best choice. Try each of the following:

	stem(Glucose1[,3],scale=2)     # scale = 1 was the default #
	stem(Glucose1[,3],scale=3)
        stem(Glucose1[,3],scale=4)
Increasing the "scale" value spreads out the stem and leaf diagram, providing more detail.

I hardly ever use histograms anymore; stem and leaf plots, while not as elegant in appearance, contain much more information.


Albyn Jones
Feb 2012