Random Numbers

Probability Distributions

Next: Simulation Experiments Up: Introduction Previous: Introduction

Discrete Distributions

Finite populations We can generate samples from a given set of objects in R using the sample function.

sample(1:100) 
	# a random permutation of the numbers 1 to 100

sample(x,10,replace=F)
	# a random sample of size N=10
	# drawn from the elements of x without replacement

sample(1:6,10,replace=T,prob=c(1/7,1/7,1/7,1/7,1/7,2/7))
	# a random sample of size 10 from the digits 1 to 6
	# with unequal probabilities of selection
	# eg. a loaded die

Bernoulli(p) Trials Bernoulli trials are sequences of independent dichotomous trials, each with probability p of success. The sample space consists of the two possible outcomes. For example, the results of 10 successive (independent) tosses of a fair coin would constitute 10 Bernoulli(1/2) trials. In R:
```
	rbinom(N,1,p)
```
generates a sequence of N trials, each with probability p of success. Actually, R returns a sequence of 1's and 0's, but we can identify 1's with successes and 0's with failures if we like.
Try generating a sample of 50 bernoulli trials, with p = .1. What is the average number of 0's between each 1? What do you think it would be after a really large number of trials?
Binomial(n,p) This distribution arises from a sequence of n independent Bernoulli trials, each with probability p of success. Since the number of successes in n trials can range from 0 to n, the sample space is just the integers 0,1,2, ... n. For example, the number of heads that occur in 20 independent tosses of a fair coin has a Binomial(20,.5) distribution. In R:
```
	rbinom(N,n,p)
```
generates N independent samples from a binomial(n,p) distribution.
Generate 100 samples of binomial(20,.5) random values and make a normal plot. What do you see?
Geometric(p) The geometric distribution models the number of failures until the first success in a sequence of independent Bernoulli trials, each with probability p of success. Note: the geometric is sometimes defined to be the number of trials including the first success. R uses the number of failures. Since we might have to repeat the experiment arbitrarily many times before the first success, the sample space is the non-negative integers: 0,1,2,... For example, if we roll a fair die, and count the number of rolls before the first ``6''appears, we have a geometric distribution with p = 1/6. In other words, if the sequence of rolls yields {4,2,4,5,6,...}, then we had 4 rolls before the first 6. In R:
```
	rgeom(N,p)
```
generates N geometric(p) counts.
Negative Binomial(r,p) The negative binomial distribution models the number of failures before the r-th success in a sequence of independent Bernoulli trials, each with probability p of success. Like the geometric distribution, the sample space for the negative binomial distribution is the non-negative integers. For example, if we roll a fair die, and count the number of ``non-6'' rolls before the third ``6''appears, we have a negative binomial(3,1/6) distribution. In particular, the outcome 0 would refer to the sequence 6,6,6; ie. no failures before the third success. In R:
```
	rnbinom(N,r,p)
```
generates N negative binomial(r,p) counts.
Poisson(m) The Poisson distibution arises as an approximation to the binomial(n,p) for large n and small p, hence the common reference to modelling ``rare events''. The sample space for the Poisson distribution is the non-negative integers. For example, the number of phone calls arriving at a switchboard in a given time interval is likely to be approximately Poisson. The parameter m is the mean count. In R:
```
	rpois(N,m)
```
generates N values from a poisson(m) distribution. To compare the poisson distribution to a similar binomial distribution, set m=n*p. You can either directly compare the probabilities for each possible outcome, or you could plot the cumulative distribution functions against each other. The sample space for a binomial(n,p) distribution is 0:n; let's try it for a binomial(20,.1) and the poisson(2) distributions:
```
	k <- 0:20
	bin <- dbinom(k,20,.1)
	pois <- dpois(k,2)
	plot(k,bin)
	points(k,pois,pch="+")
```
Analogous to a quantile-quantile plot, we can plot the cumulative relative frequencies of the two distributions against each other:
```
	k <- 0:20
	bin <- pbinom(k,20,.1)
	pois <- ppois(k,2)
	plot(bin,pois)
	abline(0,1)
```
If the two distributions agreed exactly, the plotted points would lie on a straight line. The abline function adds a line with the given intercept and slope, in this case 0 and 1 respectively.
Hypergeometric(r,b,n) The hypergeometric distribution arises from sampling without replacement from finite populations. Conceptually, it corresponds to pulling n balls out of a bag that contains r red balls and b blue balls.
```
	rhyper(N,r,b,n)
```
generates N values from the hypergeometric(r,b,n) distribution.

Continuous Distributions

Normal(m,s^2) The normal distribution with mean m and variance s^2; the sample space is the whole real line. In R:
```
	rnorm(N,m,s)
and

	m + s*rnorm(N)
```
both generate N values from the N(m,s^2) distribution. The default values are m=0 and s^2 = 1, ie. the standard normal distribution.
Uniform(a,b) The uniform distribution on the interval (a,b). The uniform distribution assigns equal probability to sub-intervals of equal length. The density is flat over the interval (a,b). In R:
```
	runif(N,a,b)
```
generates N values from the uniform(a,b) distribution. The default interval is (0,1).
Gamma(a) The gamma distribution with shape parameter a is useful for modelling waiting times; it takes only positive values and is right skewed. In R:
```
	rgamma(N,a)
```
generates N values from the gamma(a) distribution. For the basic gamma distribution (with default scale equal to 1) the shape parameter `a' is also the mean.

There are several other standard distributions built into R. You can explore these by getting into the help browser and selecting the category ``probability distributions and random numbers''.

Next: Simulation Experiments Up: Introduction

Math 141 Index
Introduction to S

Albyn Jones
Oct 2005