the bootstrapping library

R has a library of functions for bootstrapping. To access the library, type:

        library(boot)

You should take a look at the help file before proceeding; there is more detail there than necessary for elementary usage, so a quick once-over is in order.

The most important functions are boot() and boot.ci(). To use the boot function, you need to create a function that computes the statistic of interest. Let us suppose that we are interested in computing a confidence interval for the coefficient of variation. First, let's generate a random sample from some a Gamma distribution:

         x = rgamma(30,3)

The coefficient of variation is the standard deviation divided by the mean:

         sd(x)/mean(x)

To use the boot function we first must create a function that computes the cv, in a form suitable for the boot() function.

         cv = function(x,n) {sd(x[n])/mean(x[n])}

The "[n]" is for resampling the dataset. If the dataset "n" includes the number 1, then the bootstrap sample x[n] includes x[1], and so forth. It is easier to see if we sort the data first:

         y =sort(x)
         n = sample(1:30,size=30,replace=T)
         sort(n)
         sort(y[n])
         y

You should be able to match up the original values (y) with the indexed values (y[n]) and observe that for every repetition in the dataset n, there is a repetition in y[n].

Now, we are ready to run the bootstrap simulation.

         B = boot(x,cv,R=999)

The data structure B contains several components, the most important of which is the bootstrapped sample values of the statistic "cv", contained in B$t. Let's look at it:

         mean(B$t)
         sd(B$t)
         plot(density(B$t))
         quantile(B$t,c(.025,.975))  # this is a "percentile CI"

Finally, the boot.ci() function computes various types of confidence intervals:

         boot.ci(B)

The "Percentile" confidence interval should be very close to the one we computed above with the quantile function. The "Basic" confidence interval will be accurate if our statistic is close to normally distributed. The "BCa" interval is similar to the "Percentile" interval, but corrects for estimated bias.

Back to header page
Math 141 Index
Introduction to S

Albyn Jones
August 2004