Binomial Inference: testing p=1/2 |
Charles Darwin did an experiment comparing the heights of cross- and self-pollinated plants (Zea Mays, or corn, to be precise; see "The effect of cross- and self-fertilization in the plant kingdom", 1876). If you don't have it, you can reload it using the R command
Darwin <- structure( list(Pot = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4), Crossed = c(23.5, 12, 21, 22, 19.125, 21.5, 22.125, 20.375, 18.25, 21.625, 23.25, 21, 22.125, 23, 12), Self = c(17.375, 20.375, 20, 20, 18.375, 18.625, 18.625, 15.25, 16.5, 18, 16.25, 18, 12.75, 15.5, 18)), .Names = c("Pot", "Crossed", "Self"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))Remember - to look at an object just enter its name: Darwin. The resulting data table should look like this:
Pot Crossed Self 1 1 23.500 17.375 2 1 12.000 20.375 3 1 21.000 20.000 4 2 22.000 20.000 5 2 19.125 18.375 6 2 21.500 18.625 7 3 22.125 18.625 8 3 20.375 15.250 9 3 18.250 16.500 10 3 21.625 18.000 11 3 23.250 16.250 12 4 21.000 18.000 13 4 22.125 12.750 14 4 23.000 15.500 15 4 12.000 18.000
Darwin was interested in demonstrating that cross-pollinated plants were healthier than self-pollinated. The standard approach to questions like this one is to frame it as a hypothetical: if there were really no difference between the two groups, what would we expect to see?
To translate this into a question about the binomial distribution, start by converting the pairs of values above to Bernoulli trials. Call it a success (1) if the crossed plant is taller than the self-pollinated plant with which it is paired, and a failure (0) otherwise.
attach(Darwin) X <- sum(Crossed > Self)You should have 13 successes out of the 15 trials. Our hypothetical scenario, known as the Null Hypothesis, corresponds to the claim that the population proportion of 1's is 1/2. In other words, the cross-pollinated and self-pollinated plants are equally likely to be taller.
Assuming independence, we have a set of 15 Bernoulli(1/2) trials, of which 13 were successes. Suppose we wish to choose a rejection region of size no bigger than alpha=1/20. To see which values to include, take a look at the binomial probabilities, rounded off to 5 significant digits for ease of reading:
round(dbinom(0:15,15,1/2),5)Let's try using 0:3 and 12:15 for our rejection region.
R <- c(0:3,12:15) sum(dbinom(R,15,1/2))You should verify that if we make the rejection region any larger, then alpha will be bigger than 1/20 or .05.
Now, we look at the observed value, 13, and note that it is in the rejection region. The conclusion: we reject the null hypothesis p=1/2 at the 0.035 significance level. Either p is not 1/2 or we were unlucky.
sum(dbinom(R,15,2/3))It is often useful to compute the power for many values of "p" at once, and plot against "p". Here is a function that will do that for a given rejection region RR or a given significance level alpha:
bin.pwr <- function(n,p0=1/2,RR=NULL,alpha=.05,COL="red") { # plot power vs p for H0: p=p0, default p0=1/2 # RR if present is the chosen rejection region # alpha is the target significance level if(is.null(RR)){ X = 0:n db = dbinom(X,n,p0) sdb = sort(db) cb = cumsum(sdb) m = max((1:(n+1))[cb <= alpha]) R = X[db <= sdb[m]] } else{ R=RR } Alpha <- sum(dbinom(R,n,p0)) p <- seq(.01,.99,.01) P <- p for(i in 1:length(p)) { P[i] <- sum(dbinom(R,n,p[i])) } plot(p,P,ylab="Power",type="l",col=COL) title(paste("Power curve for n=",n," alpha =",round(Alpha,4))) }
R <- c(0:3,12:15) bin.pwr(15,p0=1/2,RR=R)The minimum point on this curve is the value at p=1/2, the null hypothesis. The power increases for alternatives farther away from 1/2. Does that make sense to you?
You may also use the function with a specified target significance level without specifying a rejection region:
bin.pwr(15,p0=1/2,alpha=.05)
If the null hypothesis is something other than p=1/2, you must specify its value in the function. For example, given a rejection region with values stored in R, sample size n, and the null hypothesis that p=.9, use
bin.pwr(n,.9,RR=R)The type="l" in the plot command is an "ell", not a "one". It tells the plot function to make a line plot.
Try it with some different sample sizes and rejection regions:
par(mfrow=c(2,2)) R10 <- c(0:1,9:10) bin.pwr(10,RR=R10) R20 <- c(0:5,15:20) bin.pwr(20,RR=R20) R40 <- c(0:13,27:40) bin.pwr(40,RR=R40) R80 <- c(0:30,50:80) bin.pwr(80,RR=R80)How does the power curve change as the sample size gets larger? (We haven't held alpha exactly fixed, but all were close to .03)
dbinom(13,15,1/2)However for a large number of trials, even the most likely outcome can be quite improbable. For example, try computing the probability of tossing 5000 heads in 10000 trials for a fair coin.
What we really want here is the probability of a result at least as unlikely as the one we observed. For a symmetric distribution, this is the set of outcomes at least as far from the expected count as the one we got. The expected value for n=15 and p=1/2 is 7.5. Thus we want the probability of seeing a result at least as far from 7.5 as 13 is. That includes 13, 14, and 15 in one direction, and 2, 1, and 0 in the other direction. Note: In general a result "at least as unlikely as the one we observed" should be defined by its probability under H0, not by distance from the expected value. In a symmetric distribution these are the same, but in a skewed distribution, they may be different! Can you see how to compute the p-value directly using the definition? For our example with n=15, p=1/2, and X=13, try
p=dbinom(0:15,15,1/2) sum(p[p <= dbinom(13,15,1/2)])
Here is a plot of the distribution with a line showing the probability of our observed result:
par(mfrow=c(1,1)) barplot(dbinom(0:15,15,1/2),names.arg=c(0:15),space=0) abline(h=dbinom(13,15,1/2))Note: abline(b,m) adds a line with intercept b and slope m.
Anyway, the desired calculation is easy to perform in R, just add up the binomial probabilities corresponding to those outcomes:
sum(dbinom(c(0:2,13:15),15,1/2))Alternatively, we could use the pbinom function for this computation:
pbinom(2,15,1/2) + (1 - pbinom(12,15,1/2))Either way, you should get the same answer: 0.00738..., which is less than a 1 percent chance. This quantity is called a p-value, which is short-short-hand for `the probability of seeing a result at least as unlikely as the one observed, if the null hypothesis were really true'.
WARNING: the p-value is NOT `the probability that the null hypothesis is correct'. That is not, in general computable without more information, specifically an initial probability (prior) for the truth of the null hypothesis, to be updated by Bayes rule.
The standard ritual of hypothesis testing is to decide that you have sufficient evidence that the null hypothesis is false (known as `rejecting the null hypothesis'), if the p-value is less than .05.
> # Binomial(15,3/4) probabilities > cbind(0:15,round(dbinom(0:15,15,3/4),5)) [,1] [,2] [1,] 0 0.00000 [2,] 1 0.00000 [3,] 2 0.00000 [4,] 3 0.00001 [5,] 4 0.00010 [6,] 5 0.00068 [7,] 6 0.00340 [8,] 7 0.01311 [9,] 8 0.03932 [10,] 9 0.09175 [11,] 10 0.16515 [12,] 11 0.22520 [13,] 12 0.22520 [14,] 13 0.15591 [15,] 14 0.06682 [16,] 15 0.01336 barplot(dbinom(0:15,15,3/4),names.arg=c(0:15)) abline(dbinom(7,15,3/4),0)The tails of the distribution are no longer the same: we need to choose a rejection region assymetrically. For example, to achieve a significance level no bigger than .05, we could include in the rejection region {0:7, 15}. Verify that the significance level (the sum of the probabilities of outcomes in the rejection region computed under the null hypothesis) is about .03. If we included 8 in the rejection region, the significance level would exceed .05; if we let {0:8} be the rejection region, then we are including an outcome with greater probability than the one we left out (15), which would not make sense. Outcomes with smaller probability must be stronger evidence against the null hypothesis!
?binom.testTry it with Darwin's data:
binom.test(13,15,p=1/2)If you don't understand the arguments or the printed output for the binom.test() function, go back and reread the help file now!