Data frames are S objects (data structures) which combine features of matrices and lists, that is a list of variables all containing the same number of observations. Typically, the variables are sets of measurements on a collection of cases, so that each row of the data frame is the set of measurements for one case (subject), and each column is the set of measurements for all subjects on a single variable.
X <- data.frame(x,y,z)Here is a simple example, illustrating the fact that the objects don't all have to be the same type. Columns of a data.frame can be numeric, character, logical, or other modes, such as factors.
x <- 1:5 y <- letters[1:5] z <- rnorm(5) X <- data.frame(x,y,z) X x y z 1 1 a -0.8358069 2 2 b 0.8515970 3 3 c -0.1151393 4 4 d -0.7857153 5 5 e 0.2684005Similarly, if our data is already in the form of a matrix:
X <- data.frame(X)
X <- read.table("filename",header=T)
X["row",] # select the row labeled "row" X[,"col"] # select the column labeled "col" X[2,] # select row 2 X[,3] # select column 3 X[2,3] # select the element in row 2, column 3.
attach(X) plot(A,B) vs. plot(X[,"A"],X[,"B"])Note: if there is a variable in the .Data directory named "A", then the reference to "A" will select it, rather than the desired column of the data frame, unless the search order is modified. To "detach" a data frame, use the "search" function to find the posistion of the data frame in the search list, and then use the "detach" function to remove it from the search list.
plot(X)has a special meaning, and
X.lm <- lm(A ~ B,data=X)causes S to fit a "linear model", ie. regression line to the columns of X specified.