Creating and editing datasets

Perhaps the simplest method for creating a data.frame is to use a spreadsheet to create the dataset, then export as either plain text or .csv (comma separated text format). You should not include special characters or embedded blanks in variable names. A plain text file can be imported to R by read.table(); a .csv file can be imported with read.table(...,header=TRUE,sep=",") or by read.csv().

The data.entry function can be used to enter raw data via a primitive spreadsheet interface. I don't like it, but your mileage may vary. In my opinion, it is better to create a file in plain text tabular format which you can then read with the scan() or read.table() functions. Currently the appropriate Mac program is called TextEdit. Make sure that the file format is set to plain text in the preferences menu.

Note: for R on the Macintosh systems in ETC211 you may need to change the working directory before using the export/import functions. Pull down the tools menu, select "change working directory", and select "Desktop". Don't forget to hit the "continue" button to apply the change.

If you have a single variable enter the data with spaces between the values. Be sure that you type a return after the last value entered. You should save the file to the Desktop or to your own space on the home server. You can then load the data into R using the scan() function:

   x <- scan("junk.dat")

If you have a several variables, enter the data in columns with spaces between the values. Put the names of the variables at the top of each column. Remember that variable names should not have embedded spaces or special characters. Be sure that you enter a return after the last line of data. You can then load the data into R using the read.table() function:

   Junk <- read.table("junk.dat",header=T)
If you have a categorical variable, it is simplest to use labels for the categories that don't have imbedded spaces or other special characters. You can put double quotes around values if necessary. Use NA for any missing values.


The data.entry() function invokes a spreadsheet interface for manual data entry. It needs to know the names of the variables and the data types. The simplest way to use data.entry is as follows:
x = numeric(100)   # create a numeric variable of length 100
y = numeric(100)   # another numeric variable
a = character(100) # create a character variable
data.entry(x,y,a)
Navigate in the spreadsheet using the mouse. When you quit the spreadsheet, the values are saved in the named variables. You can then collect them in a data.frame if desired.


The edit() function allows you to edit files that already exist. It invokes a simple text editor. When you close the window, be sure to select the "save" option.