Boxplots

A boxplot is really not much more than a graphical display of a 5-number summary (min, Q1, median, Q3, max). The body of the box represents the location of the quartiles, with a line added at the median. The ``whiskers'', or lines extending out from the box, display the distance to the furthest observations which are no more than 1.5 times the IQR from the quartiles. Outliers are displyed as points or lines beyond the whiskers.

	boxplot(Glucose1[,3])

It is worth noting that the definition of an outlier in the boxplot function is different than the default definition in the stem function.

The real utility of boxplots is for comparing distributions of several samples. Try it:

	
boxplot(Glucose1,col="tan")

When given a matrix or data.frame, the boxplot function assumes that you wish to make side-by-side boxplots for the columns.

To compare the non-pregnant women with the pregnant women, simply include both groups in the boxplot command. Unfortunately the dimnames in the two datasets are similar, so you will want to do something like the following:

	
boxplot(Glucose1[,1],Glucose1[,2],Glucose1[,3],
	Glucose1[,4],Glucose1[,5],Glucose1[,6],
	Glucose2[,1],Glucose2[,2],Glucose2[,3], 
	names=c("N1","N2","N3","N4","N5","N6","P1","P2","P3"),
        col=c(rep("yellow",6),rep("green",3)))

What are the important differences between the two groups, and are there any other interesting features of the distributions that are apparent in these boxplots?

Math 141 Index

Albyn Jones