One Sample: Inference about the mean

Just as for the Binomial distribution, there are two basic procedures of statistical inference that we need to understand. The first is the hypothesis test, which addresses a question of the form "is a particular hypothesis consistent with the observed data?", the second is the confidence interval, which represents the set of values for a parameter of interest that are consistent with the observed data. It is possible to interpret the each question as a version of the other: a particular hypothesis is consistent with the data (at significance level alpha%) if it corresponds to a value of the parameter included in the corresponding (100-alpha)% confidence interval. Conversely, a parameter value contained in the (100-alpha)% confidence interval must correspond to a hypothesis that would not be rejected at significance level alpha%. The conventional value for alpha is .05, or 5%.

Here are some data on accuracy of radon detectors. 12 radon detectors were placed in a chamber containing 105 picocuries/liter of radon, and the resulting measurements recorded:

91.9 97.8 111.4 122.3 105.4 95 
103.8 99.6 96.6 119.3 104.8 101.7

Store the data in a variable called "radon". You can do this using the "scan" function:

     radon <- scan()

The prompt will swicht from ">" to the current observation number, initially "1:". Then type in or paste in the data, seperated by blanks or returns. entering a blank line terminates the scan function. Be careful when pasting the values in that you don't paste in a blank line at the beginning - scan will think you are done before you start!

Alternatively, you can use the combine function (c()):

     radon <- c(91.9, 97.8, 111.4, 122.3, 105.4, 95,
                103.8, 99.6, 96.6, 119.3, 104.8, 101.7)

Start by examining the data, ie. make a stem and leaf plot and a normal plot. It appears that the data are skewed, but not outlandishly so.

Hypothesis tests: Let's test the hypothesis that the mean measurement is 105. The alternative hypothesis is that the mean of the measurements is not equal to 105, ie. the "two tailed" alternative. This involves computing the so-called "t-statistic": the numerator is the difference between the estimate and the hypothesized value, the denominator is the standard error of the estimate.

m <- mean(radon)

s <- sd(radon)
	# or s <- sqrt(var(radon))

n <- length(radon) 
	# this should be 12

tstat <- (m - 105)/(s/sqrt(n))

You should get -0.31947... here. For a hypothesis test with significance level .05, compare the t statistic to the "critical point" of the t distribution, ie. that quantile which marks off the distance from the mean that leaves 2.5% in each tail, or qt(.975,11), about 2.2. The 11 is the degrees of freedom (df), which measures the effective sample size for computing the variance or standard deviation.

We would reject the null hypothesis at the 5% significance level if the absolute value of the t-statistic were greater than the critical value. The observed t-statistic (-.31...) is not even close to the rejection region, so we should announce that we fail to reject the null hypothesis at the 5% significance level.

Computing the t-statistic as above is equivalent to defining a rejection region as we did with the binomial: choose the rejection region so that the test has the desired significance level. In this case, we would reject H0: mu = 105 if the sample mean (m) is too far from 105: reject if

m < 105 - 2.2*s/sqrt(n)

or if

m > 105 + 2.2*s/sqrt(n)

The t distribution is closely related to the normal distribution. You can think of it as incorporating the extra variation caused by our ignorance of the true SD. If we knew the SD (instead of estimating it from the data) the rejection region for a 5% significance level would be

m < 105 - 1.96*s/sqrt(n)

and

m > 105 + 1.96*s/sqrt(n)

Finally, we can compute the p-value, or probability of getting a more extreme result than the observed one under the null hypothesis:

2*pt(-0.3194729,11)

We multiply by 2 to include both tails of the distribution. The p-value is about .75, again indicating no evidence against the null hypothesis.

How would you compute the p-value if the t value were positive? You should convince yourself that

2*pt(-abs(tstat),df)

will always work, at least if you fill in the appropriate values for the t statistic and its degrees of freedom.

Confidence Intervals: We can compute a 95% confidence interval for the mean of the distribution as follows:

m <- mean(radon)

s <- sd(radon)

n <- length(radon) 

m - qt(.975,11)*s/sqrt(n)
	# compute the lower endpoint of the CI

m + qt(.975,11)*s/sqrt(n)
	# compute the upper endpoint

You should have the interval (98.1625, 110.1042); clearly this interval includes the desired value, 105, so we have no evidence that the mean of the distribution (the expected value of a measurement under the specified conditions) is different from 105.

The t.test function

You should know how to do these computations given the mean, standard deviation, and sample size, since occasionally you will want to do the computation when you don't have the raw data. If you do have the raw data in S, the "t.test" function can be used to do the computations. To test the null hypothesis that mu = 105, enter:

t.test(radon,mu=105)

The t.test function by default tests the null hypothesis mu=0. Since we wanted a different null hyp[othesis, we had to explicitly specify it. The t.test function prints the p-value for the two-sided alternative hypothesis, and prints a 95% confidence interval by default. These default choices can be modified, try:

    t.test(radon,mu=105,conf.level=.99)

    t.test(radon,mu=105, alternative = "less")

The choices for "alternative" are "two.sided", "less", and "greater". Look at the confidence intervals that are produced when you choose a one-sided hypothesis test.

Power computations

R has a function that does the power computations for us, called "power.t.test". Try looking at the help file:

      ?power.t.test

It takes several arguments.

"n" is the sample size.
"delta" is the alternative hypothesis expressed as a difference between the actual mean the mean under the null hyopthesis.
"sd" is the assumed standard deviation.
"sig.level" is the significance level.
"power" is the power (between 0 and 1).

The standard usage is to specify delta, sd, and n - then the power will be computed. You can also use it to estimate the sample size need to achieve a given level of power, by specifying delta, sd, and power. The "observed power" is computed using the observed delta, sd, and n. For the radon data, testing mu=105 at the 5% significance level:

power.t.test(n=12, delta=.87,sd=9.4,type="one.sample")

The power was pretty small - less than .05. Here delta was the difference between the mean (104.13) and 105 - the sign doesn't matter, just the absolute magnitude of the difference. How big a sample size would we need to reject the null hypothesis if the true mean were 104, and the sd were 9? Suppose we want a sample size big enough to give power .95:

power.t.test(delta=1,sd=9.4,power=.95,type="one.sample")

Look at the value for n in the output!

Bootstrapping

The bootstrap is nothing more than a simulation done using the empirical distribution (ie. the observed data) as if it were the complete population. In other words, we sample with probabilities determined by the empirical distribution function in hopes that the distribution of our statistic in samples from the empirical distribution approximates the distribution of the statistic when sampling from the population or true distribution.

To start with, create a dataset for storing the bootstrapped mean values, then use a "for loop" to repeatedly resample the radon data. If the data were the result of a more complicated sampling procedure, such as a multi-level design, then our resampling must mimic the original sampling plan. Here we think the sample was IID, ie. the measurement error for each device is independent of the errors for the others and all have the same accuracy, so the bootstrap resampling is completely straight-forward:


boot.m <- rep(0,1000)  # we will generate 1000 means here,
                      # in practice you might want 10,000 or more
                      # [after all, its just computer time !-]

for (i in 1:1000)
{
	x <- sample(radon,replace=T)
	boot.m[i] <- mean(x)
}

The for loop may take a few seconds to run, so be patient! We can now look at the distribution of the sample mean for samples of size 12 from the empirical distribution. We could compute the standard deviation of this distribution, and use it in a CI instead of the standard error (sd(radon)/sqrt(12)), but that assumes that the sample mean is symmetrically distributed. To avoid making any assumptions, just directly construct a 95% CI by using the quantiles of the bootstrap distribution:

quantile(boot.m,c(.025,.975))

Since the data are at least roughly normally distributed, the bootstrap CI should at least approximately agree with the CI we computed above based on the t distribution. Here it may be a bit narrower, since the t-based CI tends to be a bit conservative (ie. too wide) when the data are skewed. The basic bootstrap CI can be improved, but is already at least approximately correct.

R has a library of functions for bootstrapping.

Back to header page
Math 141 Index
Introduction to S

Albyn Jones
August 2004