the bootstrapping library |
R has a library of functions for bootstrapping. To access the library, type:
library(boot)
You should take a look at the help file before proceeding; there is
more detail there than necessary for elementary usage, so a quick
once-over is in order.
The most important functions are boot() and boot.ci(). To use the boot function, you need to create a function that computes the statistic of interest. Let us suppose that we are interested in computing a confidence interval for the coefficient of variation. First, let's generate a random sample from some a Gamma distribution:
x = rgamma(30,3)
The coefficient of variation is the standard deviation divided by
the mean:
sd(x)/mean(x)
To use the boot function we first must create a function that computes
the cv, in a form suitable for the boot() function.
cv = function(x,n) {sd(x[n])/mean(x[n])}
The "[n]" is for resampling the dataset. If the dataset "n"
includes the number 1, then the bootstrap sample x[n] includes x[1],
and so forth. It is easier to see if we sort the data first:
y =sort(x)
n = sample(1:30,size=30,replace=T)
sort(n)
sort(y[n])
y
You should be able to match up the original values (y) with the
indexed values (y[n]) and observe that for every repetition in the
dataset n, there is a repetition in y[n].
Now, we are ready to run the bootstrap simulation.
B = boot(x,cv,R=999)
The data structure B contains several components, the most important
of which is the bootstrapped sample values of the statistic "cv",
contained in B$t. Let's look at it:
mean(B$t)
sd(B$t)
plot(density(B$t))
quantile(B$t,c(.025,.975)) # this is a "percentile CI"
Finally, the boot.ci() function computes various types
of confidence intervals:
boot.ci(B)
The "Percentile" confidence interval should be very close to the one
we computed above with the quantile function. The "Basic" confidence
interval will be accurate if our statistic is close to normally
distributed. The "BCa" interval is similar to the "Percentile"
interval, but corrects for estimated bias.