Probability Computations Related to Binomial Distributions |
For a binomial(n,p) random variable X, the R functions involve the abbreviation "binom":
dbinom(k,n,p) # binomial(n,p) density at k: Pr(X = k)
pbinom(k,n,p) # binomial(n,p) CDF at k: Pr(X <= k)
qbinom(P,n,p) # binomial(n,p) P-th quantile
rbinom(N,n,p) # N binomial(n,p) random variables
help(Binomial) # documentation on the functions related
# to the Binomial distribution
choose(n,k)
For example, choose(4,2) should be 6.
Check your answer: Solutions
For a binomial(6,1/3) random variable named X, compute the probability of 2 successes, Pr(X=2):
> dbinom(2,6,1/3) [1] 0.3292181Compare to the direct computation (don't type all this into R, it's an equation not an R command :-):
Pr(X = 2) = ((6*5*4*3*2*1)/(2*1*4*3*2*1))*(1/3)^2*(2/3)^4
= 15*(1/3)^2*(2/3)^4
= 0.3292181
or in R:
choose(6,2)*(1/3)^2*(2/3)^4
We can compute the probabilities for the whole sample space at once if we like with:
dbinom(0:6,6,1/3)
A nicer way to display those probabilities is in a table,
using the data.frame() function:
data.frame(X = 0:6, Prob = dbinom(0:6,6,1/3))
When printing tables like this, it is often better to round off
the entries to fewer significant digits:
M <- data.frame(X = 0:6, Prob = dbinom(0:6,6,1/3))
round(M,4)
We can also produce the theoretical histogram for repeated trials of a given binomial experiment. Here is a function to draw the binomial density "curve", you can paste it in to your R session:
hist.binom <- function(n, p, ...)
{
# plots a relative frequency histogram of the binomial distribution
k <- 0:n
p <- dbinom(k, n, p)
names(p) <- as.character(0:n)
barplot(p,space=0, ...)
}
A relative frequency histogram is just a barplot with no space
between the bars, where
the total area is 1. The height of each bar here the probability
associated with that binomial outcome.
To use the hist.binom function, you must specify the values of
n and p. For example, to get the histogram of a binomial(6,1/3)
distribution, use
hist.binom(6,1/3)
Try experimenting with the color ("col") argument; use help(colors) or
colors() to see what colors R knows by name.
hist.binom(6,1/3,col="yellow")
First, set up the plot window to hold 4 plots using par(mfrow=c(2,2)). Then, plot the binomial densities for the following values of n and p:
n p
20 1/2
20 1/4
20 3/4
20 9/10
Where is the peak of the density
(the mode) located in each case? What is the relationship between the
mode and the values of n and p?
Check your answer: Solutions
For a binomial(6,1/3) random variable X, compute the probability that X is less than 3; in other words, Pr(X <= 2):
pbinom(2,6,1/3)
Compare to summing the density (ie adding up the areas under the
binomial histogram:
dbinom(0,6,1/3)+dbinom(1,6,1/3)+dbinom(2,6,1/3)
or
sum(dbinom(0:2,6,1/3))
We can find the probability that we get more than a given number of successes by subtraction: for the probability that X is greater than 2, Pr(X > 2):
1 - pbinom(2,6,1/3)
We could of course compute the probability of that event by summing
the individual probabilities:
sum(dbinom(3:6,6,1/3))
Finally, R has another option for this computation:
pbinom(2,6,1/3,lower.tail=FALSE)
The tails of a probability distribution are the values at either
end of the range of the random variable. The pbinom function normally
assumes that you want the lower tail of the distribution, that is the
probability of getting less than or equal to a specified value.
The specification "lower.tail=FALSE" tells S to compute the upper tail
of the distribution, that is the probability of getting a value greater
than the argument. Compare all three methods and make sure that
they give the same probabilities.
Note: since the Binomial sample space only includes integer values,
Pr(X < k) is the same as Pr(X <= k-1).
For continuous distributions, the "p" function and the "q" function are inverses (ie either one undoes the action of the other: try qnorm(pnorm(0)), or pnorm(qnorm(.5))). This doesn't work out quite so perfectly for the binomial distribution because of the discrete nature of the sample space. It is too "lumpy". Compare
qbinom(.5,6,1/3)
pbinom(2,6,1/3)
For a binomial(200,1/2) distribution:
For a binomial(2000,1/2) distribution:
Note that these sets of computations pertain to the same range of sample proportions (number of successes divided by number of trials: 8/20 = 80/200 = 800/2000), but with different sample sizes. Remember the pbinom function gives Pr(X <= k), not Pr(X < k).
Check your answers: Solutions
dhyper(k,r,b,n) # hypergeometric density at k: Pr(X = k)
# r = number of red balls in the urn
# b = number of black balls in the urn
# n = sample size
phyper(k,r,b,n) # hypergeometric CDF at k: Pr(X <= k)
rhyper(reps,r,b,n)
# generate random samples
# reps = number of repetitions of sampling
# n draws from an urn with r reds and b blacks
# value: a sample of size "reps" of numbers
# of reds drawn in each trial of n draws
dpois(k, mu) # poisson density at k: Pr(X = k)
ppois(k, mu) # poisson CDF at k: Pr(X <= k)
rpois(n, mu) # generate n poisson(mu) random variables
Deaths = c(0,1,2,3,4,5)
N = c(109,65,22,3,1,0)
weighted.mean(Deaths,N)
[1] 0.61
"Deaths" correspond to the outcomes for a single unit, "N" is the
number of cavalry units corresponding to each outcome (109 had 0 deaths,
65 had 1 death, etc.).
Using the weighted.mean function is equivalent to
computing the average of 109 0's, 65 1's, 22 2's, etc. You can verify
this by createing the full dataset:
D1 <- rep(Deaths,N)
mean(D1)
Homework Exercises:
|