next up previous
Next: Writing Functions in S Up: Introduction to S Previous: Basic Graphics

Data Structures

S uses several other data structures in addition to the basic lists of numeric or character data (variables).

One of the most important is the data.frame, which has become the standard data structure expected by S statistical modelling functions. It resembles a more basic data structure: the matrix. After you have read about matrices and lists you will want to follow the data frame link above for more details about data frames.

A matrix is simply a rectangular array of numbers. There are several ways to create matrices; the most common being to paste together several columns or rows of the same length (`cbind()' and `rbind()', respectively), and using the `matrix' command to create a matrix from a string of values.

Here is an example of the use of the `cbind()' function.

> x
[1] 1 2 3 4 5

> X<-cbind(x,x^2,x^3)  
> # form a matrix with x, x^2, x^3 as the columns

> X
     [,1] [,2] [,3] 
[1,]    1    1    1
[2,]    2    4    8
[3,]    3    9   27
[4,]    4   16   64
[5,]    5   25  125

Elements of a matrix may be referrenced by subscripting: the entry in the i-th row and j-th column is X[i,j]:

> X[1,1]     # select the entry in the first row and first column
[1] 1

> X[2,3]     # select the entry in the 2nd row, 3rd column
[1] 8

> X[,1]      # select the whole first column
[1] 1 2 3 4 5

> X[1,]      # select the whole first row
[1] 1 1 1

The `matrix' command accepts as parameters ncol, the number of columns in the matrix, nrow, the number of rows in the matrix, and byrow, a logical value indicating the order to store the data: `F' (the default) indicating that the data are to be stored in sequence down the columns, `T' indicating that the data should be entered across rows.

 
> m<-matrix(1:15,nrow=5,ncol=3)
> m
     [,1] [,2] [,3] 
[1,]    1    6   11
[2,]    2    7   12
[3,]    3    8   13
[4,]    4    9   14
[5,]    5   10   15

> m1<-matrix(1:15,nrow=5,ncol=3,byrow=T)
> m1
     [,1] [,2] [,3] 
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12
[5,]   13   14   15
S data structures also have associated with them a set of `attributes'. For a matrix this includes the dim attribute, ie. the number of rows and columns in the matrix, and optionally, the dimnames attribute, which is used to store labels for the rows and columns of the matrix. Elements of a matrix can be referrenced by row or column label as well as by subscript. The `dimnames' are also used as row and column labels when a matrix is printed.

> dimnames(m)<-list(c("row 1","row 2", "row 3","row 4", "row 5"),
+     c("col 1", "col 2", "col 3"))

> # note that S uses `+' as a prompt when a command is not finished
> # it means: continue on the next line

> m
      col 1 col 2 col 3 
row 1     1     6    11
row 2     2     7    12
row 3     3     8    13
row 4     4     9    14
row 5     5    10    15

> m["row 1",]
 col 1 col 2 col 3 
     1     6    11
Another type of data structure, the list, appears in the example above. A list is simply a collection of other data structures, in this case two vectors of labels whose lengths must match the dimensions of the matrix. The elements of a list may be given labels by a specification of the form `list(label=element, ...)'. Many S functions return data structures in the form of lists, such as the lm function.

> list(lower=letters[1:10],upper=LETTERS[1:10])
$lower:
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

$upper:
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"

The `letters' objects are S system datasets which can be useful for labeling things.

You will want to read about data.frames too.


next up previous
Next: Writing Functions in S Up: Introduction to S Previous: Basic Graphics

Albyn Jones
jones@reed.edu
Tue Jun 25 11:03:47 PDT 1996