Scatterplots |
We will use a dataset containing data on 1993 auto models, most of it collected by Consumers Union, and made available over the internet on the Statlib archive; paste in the source code at this link to load the data we will be using into your R session. The data frame is called "Cars93", use the attach command to allow you to access the columns by name.
Here are some examples of the basic plot features that will be useful for looking at relationships between variables.
plot(Hpwr,Hmpg,main="Horsepower vs Miles/Gal")
?ifelse ifelse(US,"gold","blue") # that is really a shortcut for ifelse(US==1,"gold","blue")Now, make plots!
plot(Hpwr,Hmpg,pch=19, col=ifelse(US,"red","darkgreen")) legend(locator(1),legend=c("Foreign","Domestic"), pch=c(19,19),col=c("red","darkgreen"))To use the locator function, move the cursor into the plot window, and click in an empty region where you would like to place the legend. To see what the locator function is doing, try using the locator function all by itself in an existing plot:
locator(1)
plot(Hpwr,Hmpg,pch=ifelse(US,19,17)) legend(locator(1),c("Foreign","Domestic"),pch=c(17,19))
dev.off() plot.new() par(usr=c(-1,21,0,1)) for(i in 0:20) { points(i, .5, pch=i); text(i, .35, i) } text(9, .75, 'Samples of "pch" Parameter') box()
Now, let's add color:
COL <- ifelse(US,"blue","red") PCH <- ifelse(US,9,5) plot(Hpwr,Hmpg,pch=PCH,col=COL) legend(locator(1),c("Foreign","Domestic"),pch=c(5,9),col=c("red","blue"))
plot(Hpwr,Hmpg,type="n") text(Hpwr,Hmpg,Cyl)
range(Wgt) w <- (Wgt-1695)/(4105-1695)We subtract the minimum, and divide by the range. The lightest car gets mapped to 0, the heaviest to 1. The plot uses the rgb function to color code the points, I have chosen the plot symbol numbered "19" (filled circle) instead of the default open circle to make the color more visible:
plot(Cmpg,Hmpg,pch=19,col=rgb(w,0,0))
plot(Hpwr,Hmpg) identify(Hpwr,Hmpg)Click on a point with mouse button 1 to select it. To terminate the identify function on the Mac, hit the escape key or with a two button mouse, right click in the graphics windowy. On Unix systems, pressing mouse button 2 will terminate the function. Upon termination, identify() returns the case numbers of the selected cases. The identify function is useful for identifying interesting cases in a plot, such as outliers. It saves you the trouble of trying to read through the dataset manually to try to find a case with values matching one you see in a scatterplot.
plot(Hpwr,Hmpg) identify(Hpwr,Hmpg,Model)To save the case numbers returned by the identify function, just assign the output to a variable:
plot(Hpwr,Hmpg) n <- identify(Hpwr,Hmpg,Model) n
plot(Hpwr,Hmpg) HH.lm <- lm(Hmpg ~ Hpwr) abline(coef(HH.lm))
HH.lo <- loess(Hmpg~Hpwr) lines(HH.lo,lty=3) legend(locator(1),c("LSE","Lowess"),lty=c(1,3))Note that the plot function will work on transformed data too:
plot(Hpwr,1/Hmpg)To produce what is known as a log-log plot, simply plot the logarithm of each variable:
plot(log(Hpwr),log(Hmpg))In different scales, the relationship between the variables may appear quite different!
IceCoreCO2 <- read.csv("~/Desktop/merged_ice_core_yearly.csv", header=FALSE,skip=28) head(IceCoreCO2) names(IceCoreCO2) <- c("Year","CO2") attach(IceCoreCO2) plot(Year,CO2,pch=19,col="red",col.lab="green",col.axis="purple") ?loess CO2.lo <- loess(CO2~Year,data=IceCoreCO2) lines(CO2.lo,col="blue",lwd=2) title("Atmospheric CO2, ppm",col.main="magenta")
|