Probability Computations Related to Binomial Distributions |
For a binomial(n,p) random variable X, the R functions involve the abbreviation "binom":
dbinom(k,n,p) # binomial(n,p) density at k: Pr(X = k) pbinom(k,n,p) # binomial(n,p) CDF at k: Pr(X <= k) qbinom(P,n,p) # binomial(n,p) P-th quantile rbinom(N,n,p) # N binomial(n,p) random variables help(Binomial) # documentation on the functions related # to the Binomial distribution
choose(n,k)For example, choose(4,2) should be 6.
Check your answer: Solutions
For a binomial(6,1/3) random variable named X, compute the probability of 2 successes, Pr(X=2):
> dbinom(2,6,1/3) [1] 0.3292181Compare to the direct computation (don't type all this into R, it's an equation not an R command :-):
Pr(X = 2) = ((6*5*4*3*2*1)/(2*1*4*3*2*1))*(1/3)^2*(2/3)^4 = 15*(1/3)^2*(2/3)^4 = 0.3292181or in R:
choose(6,2)*(1/3)^2*(2/3)^4
We can compute the probabilities for the whole sample space at once if we like with:
dbinom(0:6,6,1/3)A nicer way to display those probabilities is in a table, using the data.frame() function:
data.frame(X = 0:6, Prob = dbinom(0:6,6,1/3))When printing tables like this, it is often better to round off the entries to fewer significant digits:
M <- data.frame(X = 0:6, Prob = dbinom(0:6,6,1/3)) round(M,4)
We can also produce the theoretical histogram for repeated trials of a given binomial experiment. Here is a function to draw the binomial density "curve", you can paste it in to your R session:
hist.binom <- function(n, p, ...) { # plots a relative frequency histogram of the binomial distribution k <- 0:n p <- dbinom(k, n, p) names(p) <- as.character(0:n) barplot(p,space=0, ...) }A relative frequency histogram is just a barplot with no space between the bars, where the total area is 1. The height of each bar here the probability associated with that binomial outcome. To use the hist.binom function, you must specify the values of n and p. For example, to get the histogram of a binomial(6,1/3) distribution, use
hist.binom(6,1/3)Try experimenting with the color ("col") argument; use help(colors) or colors() to see what colors R knows by name.
hist.binom(6,1/3,col="yellow")
First, set up the plot window to hold 4 plots using par(mfrow=c(2,2)). Then, plot the binomial densities for the following values of n and p:
n p 20 1/2 20 1/4 20 3/4 20 9/10Where is the peak of the density (the mode) located in each case? What is the relationship between the mode and the values of n and p?
Check your answer: Solutions
For a binomial(6,1/3) random variable X, compute the probability that X is less than 3; in other words, Pr(X <= 2):
pbinom(2,6,1/3)Compare to summing the density (ie adding up the areas under the binomial histogram:
dbinom(0,6,1/3)+dbinom(1,6,1/3)+dbinom(2,6,1/3) or sum(dbinom(0:2,6,1/3))
We can find the probability that we get more than a given number of successes by subtraction: for the probability that X is greater than 2, Pr(X > 2):
1 - pbinom(2,6,1/3)We could of course compute the probability of that event by summing the individual probabilities:
sum(dbinom(3:6,6,1/3))Finally, R has another option for this computation:
pbinom(2,6,1/3,lower.tail=FALSE)The tails of a probability distribution are the values at either end of the range of the random variable. The pbinom function normally assumes that you want the lower tail of the distribution, that is the probability of getting less than or equal to a specified value. The specification "lower.tail=FALSE" tells S to compute the upper tail of the distribution, that is the probability of getting a value greater than the argument. Compare all three methods and make sure that they give the same probabilities. Note: since the Binomial sample space only includes integer values, Pr(X < k) is the same as Pr(X <= k-1).
For continuous distributions, the "p" function and the "q" function are inverses (ie either one undoes the action of the other: try qnorm(pnorm(0)), or pnorm(qnorm(.5))). This doesn't work out quite so perfectly for the binomial distribution because of the discrete nature of the sample space. It is too "lumpy". Compare
qbinom(.5,6,1/3) pbinom(2,6,1/3)
For a binomial(200,1/2) distribution:
For a binomial(2000,1/2) distribution:
Note that these sets of computations pertain to the same range of sample proportions (number of successes divided by number of trials: 8/20 = 80/200 = 800/2000), but with different sample sizes. Remember the pbinom function gives Pr(X <= k), not Pr(X < k).
Check your answers: Solutions
dhyper(k,r,b,n) # hypergeometric density at k: Pr(X = k) # r = number of red balls in the urn # b = number of black balls in the urn # n = sample size phyper(k,r,b,n) # hypergeometric CDF at k: Pr(X <= k) rhyper(reps,r,b,n) # generate random samples # reps = number of repetitions of sampling # n draws from an urn with r reds and b blacks # value: a sample of size "reps" of numbers # of reds drawn in each trial of n draws
dpois(k, mu) # poisson density at k: Pr(X = k) ppois(k, mu) # poisson CDF at k: Pr(X <= k) rpois(n, mu) # generate n poisson(mu) random variables
Deaths = c(0,1,2,3,4,5) N = c(109,65,22,3,1,0) weighted.mean(Deaths,N) [1] 0.61"Deaths" correspond to the outcomes for a single unit, "N" is the number of cavalry units corresponding to each outcome (109 had 0 deaths, 65 had 1 death, etc.). Using the weighted.mean function is equivalent to computing the average of 109 0's, 65 1's, 22 2's, etc. You can verify this by createing the full dataset:
D1 <- rep(Deaths,N) mean(D1)
Homework Exercises:
|