Probability Distributions |
sample(1:100) # a random permutation of the numbers 1 to 100 sample(x,10,replace=F) # a random sample of size N=10 # drawn from the elements of x without replacement sample(1:6,10,replace=T,prob=c(1/7,1/7,1/7,1/7,1/7,2/7)) # a random sample of size 10 from the digits 1 to 6 # with unequal probabilities of selection # eg. a loaded die
rbinom(N,1,p)generates a sequence of N trials, each with probability p of success. Actually, R returns a sequence of 1's and 0's, but we can identify 1's with successes and 0's with failures if we like.
Try generating a sample of 50 bernoulli trials, with p = .1. What is the average number of 0's between each 1? What do you think it would be after a really large number of trials?
rbinom(N,n,p)generates N independent samples from a binomial(n,p) distribution.
Generate 100 samples of binomial(20,.5) random values and make a normal plot. What do you see?
rgeom(N,p)generates N geometric(p) counts.
rnbinom(N,r,p)generates N negative binomial(r,p) counts.
rpois(N,m)generates N values from a poisson(m) distribution. To compare the poisson distribution to a similar binomial distribution, set m=n*p. You can either directly compare the probabilities for each possible outcome, or you could plot the cumulative distribution functions against each other. The sample space for a binomial(n,p) distribution is 0:n; let's try it for a binomial(20,.1) and the poisson(2) distributions:
k <- 0:20 bin <- dbinom(k,20,.1) pois <- dpois(k,2) plot(k,bin) points(k,pois,pch="+")Analogous to a quantile-quantile plot, we can plot the cumulative relative frequencies of the two distributions against each other:
k <- 0:20 bin <- pbinom(k,20,.1) pois <- ppois(k,2) plot(bin,pois) abline(0,1)If the two distributions agreed exactly, the plotted points would lie on a straight line. The abline function adds a line with the given intercept and slope, in this case 0 and 1 respectively.
rhyper(N,r,b,n)generates N values from the hypergeometric(r,b,n) distribution.
rnorm(N,m,s) and m + s*rnorm(N)both generate N values from the N(m,s^2) distribution. The default values are m=0 and s^2 = 1, ie. the standard normal distribution.
runif(N,a,b)generates N values from the uniform(a,b) distribution. The default interval is (0,1).
rgamma(N,a)generates N values from the gamma(a) distribution. For the basic gamma distribution (with default scale equal to 1) the shape parameter `a' is also the mean.
There are several other standard distributions built into R. You can explore
these by getting into the help browser and selecting the category
``probability distributions and random numbers''.