Graphical Methods



This section reviews graphical methods for studying the distribution of a set of data. We will make use of two datasets which you should load into your R session. Load the datasets into your R session with:

Glucose1 <- read.table("http://people.reed.edu/~jones/141/Glucose1.dat",
                        header=TRUE)

Glucose2 <- read.table("http://people.reed.edu/~jones/141/Glucose2.dat",
                        header=TRUE)

You should have two datasets: Glucose1 and Glucose2. Both contain data on repeated administrations of glucose tolerance tests to a sample of women who made repeated visits to Boston City Hospital between 1955 and 1960. Glucose1 has data for each of 53 non-pregnant women, measured yearly. Glucose2 has data for 52 women for each of 3 pregnancies during the same period. In each dataset, the variables test1, test2, etc., refer to the change in blood glucose levels measured first after fasting, then again 1 hour after administration a dose of 100 grams of glucose (in mg/100ml). The glucose tolerance test is used to diagnose diabetes. For more details, see the description in Data by Andrews and Herzberg.

The data are in the "data frame" format, which are arrays of data in which each case (here an individual) corresponds to a row, and each variable corresponds to a column. The row labels for these data frames are just row numbers, the column labels are the names of the variables. To list the whole data frame, just type its name.

        Glucose1
You can list the first column (test1, the first testing blood glucose measurement) in several ways:
	Glucose1[,1]              # list column number 1
(Note: the comma before the one is significant - Glucose1[1,] refers to the first row, and Glucose1[1,1] refers to the element in the first row and the first column, in this case the number 14.)
	Glucose1[,"test1"]        # list variable named test1
In either case, you should see something like:
 
 Glucose1[,1]
 [1]  14  -3  32  24 -23  15  25  9  20  11   2  30   4  18 -11  16  14 -11  30
[20] -17   0 -10  18 -19  16  25  1 -15  12   8  26  31  -8  -1   0  24   9 -14
[39]  39  12  30   9 -18  28  -5 -22  9  50  -2  11  18  16   0

It will be easier to work with the data if you create a dataset containing just the data from the first test, so that you don't have to continue to refer to the data as the first column of the Glucose1 matrix:

 A1 <- Glucose1[,1]
 A2 <- Glucose1[,2]
 B1 <- Glucose2[,1]
In the following sections, you can replace all references to Glucose1[,1] with A1 after making this assignment.

We can look at the distributions of these measurements in R in several ways:


Plotting in R