Lab 1: Getting Started With R

Here are some examples of basic R commands that you will find useful. Try typing them into an R command window (each command followed by Enter/Return). If you get an error message of the form "Error: Object not found", it may be because you skipped an earlier example which created the object, or because you mis-spelled the name of the object. Object names in R are case sensitive: x is not the same as X. Names must start with a letter, but can include numbers, ".", and "_".

It is helpful to create datasets with informative names: "FluDeaths" rather than "fd".

Creating datasets

Large datasets should be created in a spreadsheet or other text editor. Small datasets can be created directly in R.
n <- 1:10         # integer sequences
n

x <- seq(1,2,.1) # other sequences
x
                # direct data entry 
z <- c(2.3, 1.2, 4.4, 4.7, -1.2, 6.3)      
z

u <- runif(10)    # generate 10 random numbers in (0,1).
u

y <- n+z         # add two datasets to create a third
y

Colors <- c("red", "orange", "yellow", "green","blue", "violet")
Colors

MoreColors <- rep(Colors,3)  # repeated values
MoreColors

Basic Mathematics

The basic arithmetic operators are *,/,+,-, and ^ (exponentiation).
1+1              
3^2
8*2-3^2
R has the standard mathematical functions:
sqrt(81)
cos(pi)  # pi is an R constant
log(10)	 # natural log
log(1)
exp(log(1))

Selecting subsets of datasets

In R, the square brackets [] allow selection of subsets by case number or by logical comparisons.
Colors
Colors[2]           # case selection by subscript
Colors[c(1,3,5)]

               # case selection by logical comparison
N <- 1:20
N > 10
N[N > 10]

which(MoreColors == "blue")

Operations on datasets

Arithmetic operators are applied to datasets component-wise:
N <- 1:10
N
2*N
N^2

Useful R functions

help(mean)     # ask for help!!
?median

u <- runif(10)
u
sort(u)
mean(u)
median(u)
max(u)
min(u)
sum(u)

User-Defined Functions

test=function(a,b)   # start of function definition
{
  # body of the function
  a^2-b^2            # the last command is the value returned           
}                    # end of function is here
test(3,2)
test(3,0)
test(0,2)
For the CS crowd, here's an example of recursion:
Fact <- function(n)
	{
	if(n < 0) stop("n must be non-negative")
	if(n == 0) return(1)
	n*Fact(n-1)
	}

Fact(4)
Fact(-1)
	       #
# what's happening here?
Fact(3/2)
	       

Plots

x <- rnorm(50)
y <- x + rnorm(50)/3
hist(x,col="yellow")
plot(x,y,pch=19,col="blue")

Built-in Data Sets

help(data)
data()
data(co2)
help(co2)
plot(co2)         # plot the time-series
plot(co2,col="red")

Importing Data Sets

Data Types

x <- seq(1,2,.2)            
x      # numeric

a <- c("a","a","b","b","c","c") 
a      # character data

b <- factor(a)
b      # factor: a coded categorical dataset

X <- data.frame(x,b)
X      # data frame: a table of data
       # rows are cases, columns are variables

X[1,]
X[,1]
X[X[,2] == "a",]
       # select the rows for which column 2 is "a"

Example

FL <- read.csv("http://people.reed.edu/~jones/141/FL.dat")
      names(FL)
      plot(Buchanan~Bush,data=FL,pch=19)
      # or
      with(FL,plot(Bush,Buchanan,pch=19,col="blue"))
      # or
      attach(FL)
      plot(Bush,Buchanan,pch=19,col="blue")
      identify(Bush,Buchanan,row.names(FL))
      # click on points!  "esc" to exit

      ###  log scale
      plot(log(Buchanan)~log(Bush),data=FL,pch=19,col="darkgreen")
      title("Florida 2000 by County")
      FL.lm <- lm(log(Buchanan)~log(Bush),data=FL)
	       abline(coef(FL.lm),col="red",lwd=2)
	       which(row.names(FL)=="PalmBeach")
      symbols(log(Bush)[50],log(Buchanan)[50],circles=.2,
	       inches=FALSE,add=TRUE)