Binomial Inference: testing p=1/2

Charles Darwin did an experiment comparing the heights of cross- and self-pollinated plants (Zea Mays, or corn, to be precise; see "The effect of cross- and self-fertilization in the plant kingdom", 1876). If you don't have it, you can reload it using the R command

Darwin <- structure(
      list(Pot = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4),
       Crossed = c(23.5, 12, 21, 22, 19.125, 21.5, 22.125, 20.375, 
                   18.25, 21.625, 23.25, 21, 22.125, 23, 12), 
          Self = c(17.375, 20.375, 20, 20, 18.375, 18.625, 18.625, 
                   15.25, 16.5, 18, 16.25, 18, 12.75, 15.5, 18)), 
         .Names = c("Pot", "Crossed", "Self"), 
          class = "data.frame", 
       row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", 
                      "11", "12", "13", "14", "15"))
Remember - to look at an object just enter its name: Darwin. The resulting data table should look like this:
       Pot Crossed   Self 
     1   1  23.500 17.375
     2   1  12.000 20.375
     3   1  21.000 20.000
     4   2  22.000 20.000
     5   2  19.125 18.375
     6   2  21.500 18.625
     7   3  22.125 18.625
     8   3  20.375 15.250
     9   3  18.250 16.500
    10   3  21.625 18.000
    11   3  23.250 16.250
    12   4  21.000 18.000
    13   4  22.125 12.750
    14   4  23.000 15.500
    15   4  12.000 18.000

Darwin was interested in demonstrating that cross-pollinated plants were healthier than self-pollinated. The standard approach to questions like this one is to frame it as a hypothetical: if there were really no difference between the two groups, what would we expect to see?

To translate this into a question about the binomial distribution, start by converting the pairs of values above to Bernoulli trials. Call it a success (1) if the crossed plant is taller than the self-pollinated plant with which it is paired, and a failure (0) otherwise.

attach(Darwin)
X <- sum(Crossed > Self)
You should have 13 successes out of the 15 trials. Our hypothetical scenario, known as the Null Hypothesis, corresponds to the claim that the population proportion of 1's is 1/2. In other words, the cross-pollinated and self-pollinated plants are equally likely to be taller.

Assuming independence, we have a set of 15 Bernoulli(1/2) trials, of which 13 were successes. Suppose we wish to choose a rejection region of size no bigger than alpha=1/20. To see which values to include, take a look at the binomial probabilities, rounded off to 5 significant digits for ease of reading:

round(dbinom(0:15,15,1/2),5)
Let's try using 0:3 and 12:15 for our rejection region.

Significance Level

The significance level is the probability of getting an outcome in the rejection region, computed under the null hypothesis. In this instance, that is the sum of the binomial probabilities for the outcomes in the rejection region R={0,1,2,3,12,13,14,15}. To compute the significance level in R, create a dataset containing the values in the rejection region, and then sum the dbinom values for those outcomes:
R <- c(0:3,12:15)
sum(dbinom(R,15,1/2))
You should verify that if we make the rejection region any larger, then alpha will be bigger than 1/20 or .05.

Now, we look at the observed value, 13, and note that it is in the rejection region. The conclusion: we reject the null hypothesis p=1/2 at the 0.035 significance level. Either p is not 1/2 or we were unlucky.

Power

It is easy to compute the power (probability of rejecting the null hypothesis, given an alternative value for p). We just need to sum the binomial probabilities for the rejection region, using the alternative value of p. Suppose that we were interested in the alternative "p=2/3":
sum(dbinom(R,15,2/3))
It is often useful to compute the power for many values of "p" at once, and plot against "p". Here is a function that will do that for a given rejection region RR or a given significance level alpha:

bin.pwr <- function(n,p0=1/2,RR=NULL,alpha=.05,COL="red")
{
# plot power vs p for H0: p=p0, default p0=1/2
# RR if present is the chosen rejection region
# alpha is the target significance level

if(is.null(RR)){
    X = 0:n
    db = dbinom(X,n,p0)
    sdb = sort(db)
    cb = cumsum(sdb)
    m = max((1:(n+1))[cb <= alpha])
    R = X[db <= sdb[m]]
}
else{
    R=RR
}
Alpha <- sum(dbinom(R,n,p0))
p <- seq(.01,.99,.01)
P <- p
for(i in 1:length(p))
   {
   P[i] <- sum(dbinom(R,n,p[i]))
   }
plot(p,P,ylab="Power",type="l",col=COL)
title(paste("Power curve for n=",n,"   alpha =",round(Alpha,4)))
}

The type="l" in the plot command is an "ell", not a "one". It tells the plot function to make a line plot. To use the function with a specified rejection region, create a dataset R with the values defining the rejection region, and enter the sample size n, for example:
R <- c(0:3,12:15)
bin.pwr(15,p0=1/2,RR=R)
The minimum point on this curve is the value at p=1/2, the null hypothesis. The power increases for alternatives farther away from 1/2. Does that make sense to you?

You may also use the function with a specified target significance level without specifying a rejection region:

bin.pwr(15,p0=1/2,alpha=.05)

If the null hypothesis is something other than p=1/2, you must specify its value in the function. For example, given a rejection region with values stored in R, sample size n, and the null hypothesis that p=.9, use

bin.pwr(n,.9,RR=R)
The type="l" in the plot command is an "ell", not a "one". It tells the plot function to make a line plot.

Try it with some different sample sizes and rejection regions:

par(mfrow=c(2,2))
R10 <- c(0:1,9:10)
bin.pwr(10,RR=R10)
R20 <- c(0:5,15:20)
bin.pwr(20,RR=R20)
R40 <- c(0:13,27:40)
bin.pwr(40,RR=R40)
R80 <- c(0:30,50:80)
bin.pwr(80,RR=R80)
How does the power curve change as the sample size gets larger? (We haven't held alpha exactly fixed, but all were close to .03)

p-values

The classical hypothesis test starts with construction of a rejection region of a size no greater than the specified significance level. The p-value is an attempt to directly measure the improbability of an observed result. With Darwin's data, we could compute the probability of seeing exactly 13 successes in 15 trials:
dbinom(13,15,1/2)
However for a large number of trials, even the most likely outcome can be quite improbable. For example, try computing the probability of tossing 5000 heads in 10000 trials for a fair coin.

What we really want here is the probability of a result at least as unlikely as the one we observed. For a symmetric distribution, this is the set of outcomes at least as far from the expected count as the one we got. The expected value for n=15 and p=1/2 is 7.5. Thus we want the probability of seeing a result at least as far from 7.5 as 13 is. That includes 13, 14, and 15 in one direction, and 2, 1, and 0 in the other direction. Note: In general a result "at least as unlikely as the one we observed" should be defined by its probability under H0, not by distance from the expected value. In a symmetric distribution these are the same, but in a skewed distribution, they may be different! Can you see how to compute the p-value directly using the definition? For our example with n=15, p=1/2, and X=13, try

p=dbinom(0:15,15,1/2)
sum(p[p <= dbinom(13,15,1/2)])

Here is a plot of the distribution with a line showing the probability of our observed result:

par(mfrow=c(1,1))
barplot(dbinom(0:15,15,1/2),names.arg=c(0:15),space=0)
abline(h=dbinom(13,15,1/2))  
Note: abline(b,m) adds a line with intercept b and slope m.

Anyway, the desired calculation is easy to perform in R, just add up the binomial probabilities corresponding to those outcomes:

sum(dbinom(c(0:2,13:15),15,1/2))
Alternatively, we could use the pbinom function for this computation:
pbinom(2,15,1/2) + (1 - pbinom(12,15,1/2))
Either way, you should get the same answer: 0.00738..., which is less than a 1 percent chance. This quantity is called a p-value, which is short-short-hand for `the probability of seeing a result at least as unlikely as the one observed, if the null hypothesis were really true'.

WARNING: the p-value is NOT `the probability that the null hypothesis is correct'. That is not, in general computable without more information, specifically an initial probability (prior) for the truth of the null hypothesis, to be updated by Bayes rule.

The standard ritual of hypothesis testing is to decide that you have sufficient evidence that the null hypothesis is false (known as `rejecting the null hypothesis'), if the p-value is less than .05.

Testing other null hypotheses

Finally, we could test some other hypothesis here, such as p=3/4 instead of p=1/2. The procedure is the same, the only change is that calculations would then be based on the assumption that p=3/4. What should we choose for the rejection region?
> # Binomial(15,3/4) probabilities
> cbind(0:15,round(dbinom(0:15,15,3/4),5))
      [,1]    [,2]
 [1,]    0 0.00000
 [2,]    1 0.00000
 [3,]    2 0.00000
 [4,]    3 0.00001
 [5,]    4 0.00010
 [6,]    5 0.00068
 [7,]    6 0.00340
 [8,]    7 0.01311
 [9,]    8 0.03932
[10,]    9 0.09175
[11,]   10 0.16515
[12,]   11 0.22520
[13,]   12 0.22520
[14,]   13 0.15591
[15,]   14 0.06682
[16,]   15 0.01336

barplot(dbinom(0:15,15,3/4),names.arg=c(0:15))
abline(dbinom(7,15,3/4),0)
The tails of the distribution are no longer the same: we need to choose a rejection region assymetrically. For example, to achieve a significance level no bigger than .05, we could include in the rejection region {0:7, 15}. Verify that the significance level (the sum of the probabilities of outcomes in the rejection region computed under the null hypothesis) is about .03. If we included 8 in the rejection region, the significance level would exceed .05; if we let {0:8} be the rejection region, then we are including an outcome with greater probability than the one we left out (15), which would not make sense. Outcomes with smaller probability must be stronger evidence against the null hypothesis!


Back to Darwin's data: can we reject the null hypothesis that p=3/4? Verify that the p-value for testing the null hypothesis that p=3/4 is about .38.

The binom.test() function

The R binom.test() function does the computations discussed above. The only disadvantage of using it is that you don't see explicitly what the rejection region is. Read the help file:
?binom.test
Try it with Darwin's data:
          binom.test(13,15,p=1/2)
If you don't understand the arguments or the printed output for the binom.test() function, go back and reread the help file now!

Up: Inference for the Binomial

Math 141 Index
Introduction to S

Albyn Jones
August 2004