Random Permutations

A permutation of the integers {1,2,3,4,...,n} is the same numbers rearranged in a (possibly) different order. For example, the set {1,2,3} has 6 possible permutations:

 
           {1,2,3}, {2,3,1}, {3,1,2}
           {1,3,2}, {3,2,1}, {2,1,3}

You can count the number of permutations of a set of n elements in the following way: there are n choices for the first item, then (n-1) choices for the second item, and after choosing those two, (n-2) choices for the third item, and so on. Thus there are n factorial permutations of n items:

             n! = n*(n-1)*(n-2)*...*3*2*1.

For 3 items this expression gives 3*2*1=6, which is the number of permutations we counted above. As you can see, in a permutation, the order matters. The permutation {1,3,2} is different from the permutation {3,2,1}. R has a function that will compute n factorial:

             factorial(3)

The number of permutations increases rapidly with n. Try a larger value of n, say 20. It will help to display more digits.

             factorial(20)

             # set the digits option to display 18 digits
             options(digits=18)
             factorial(20)

             options(digits=7)

Software using floating point arithmetic has an upper bound for representable numbers. What is the largest integer for which R will return a number (not "Inf")?

Imagine putting the numbers 1,2,3,...,n on ping-pong balls, putting them in a bag, and mixing them thoroughly. Now draw the numbers one at a time, recording the order in which the numbers were selected. You are picking a random permutation!

The sample() function in R will do this for you. To pick a random sample of size n without replacement from the dataset x, use

sample(x,size=n,replace=FALSE)

Thus to get a random permutation of the set {1,2,3,4,5}, use

sample(1:5,size=5,replace=FALSE)

The "replace = FALSE" is not really necessary, that is the default choice.

Sampling without replacement means each member of the set can be selected only once. This is like dealing from a deck of cards; once you deal the 7 of clubs it isn't in the deck anymore! (Shuffling a deck of cards is just a way of producing a random permutation of the deck.)

Assignment:
There is a classic probability puzzle that involves permutations. Imagine that 20 people go the opera. Each person is wearing a hat. They check their hat at the door as they arrive. The hat-check person gets tipsy during the performance, and returns a random hat to each person as they leave. What is the probability that at least one person gets his or her own hat back? What is the average number of patrons who get their own hats back?

Note on Lab Assignments When preparing lab assignments, copy the R commands you used into a text file or your favorite word processor. Annotate with comments so that the reader can follow the thread. Print and submit with your assignment.
######################################################## # All comments embedded in R code should look like this # it will make your R commands much easier to follow ########################################################

Here is a method for getting an approximate answer: generate two random permutations of size 20. One represents the random order in which the patrons leave, the other the random order in which the hats are returned. (Teaser: if the hats are returned in random order, does it matter in which order the patrons leave?). Compute the number of matches. Repeat many times, and calculate the proportion of times that the number of matches was greater than zero.

              Hats <- sample(1:20,20)
              Heads <- sample(1:20,20)
              sum(Hats==Heads)

"Hats == Heads" yields a logical value (TRUE/FALSE), which R codes as 0 for FALSE, 1 for TRUE. Summing logical values is just counting the number which are TRUE. Look at the two datasets and verify the matches, if there were any.

              data.frame(Hats,Heads,Match= Heads==Hats)

That is one trial. To get an accurate estimate, you will need to repeat this many times, say 10000. You need to store the results in a dataset, which you create first:

              n.matches <- rep(0,10000)
                  # create dataset to store results
              Heads <- 1:20

              for(i in 1:10000){
                 Hats <- sample(1:20,20)
                 n.matches[i] <- sum(Hats==Heads)
              }

Now you can compute the proportion with at least one match, the average number of matches, or any other statistic of interest:

                # proportion with at least one match
              sum(n.matches > 0)/length(n.matches)

                # average number of matches
              mean(n.matches)

                # median number of matches
	      median(n.matches)

	        # standard deviation 
              sd(n.matches)

                # largest number of matches
              max(n.matches)

              hist(n.matches,breaks=seq(-.5,max(n.matches)+.5,1),freq=FALSE)

This simulation should run quickly. Repeat it several times to see how reliable your results are.

How do you think the proportion of trials with at least one match will change if there are 50 people? The average? Test your guess by repeating the simulation with 50 hats and 50 people.

How about 100 hats and heads?

Simulation is a powerful tool for attacking probability problems that are too difficult to solve analytically!

Math 141 Index
Another Simulation Experiment: Runs in Coin Tossing

Albyn Jones
September 2008