Simulating Chi squared statisticsTo get a feel for the behavior of the chi-squared statistic under random sampling, we can do a simulation: generate random samples corresponding to a specified pattern of probablilities for a 2x2 table, compute the chi-squared statistics, and study their behavior.Later we will learn more about probability distributions like the multinomial. For now, it suffices to imagine that we have a table of probabilities:
rmultinom(1,100,c(1/4,1/4,1/4,1/4))We have done one replication of our experiment of 100 trials each with the specified probabilies. To use the chisq.test() function we need to put the data into a 2x2 matrix or table: matrix(rmultinom(1,100,c(1/4,1/4,1/4,1/4)),ncol=2)Every time we use the rmultinom() function we get a new sample. Finally, let's generate a sample and use the chisq.test() function: A = matrix(rmultinom(1,100,c(1/4,1/4,1/4,1/4)),ncol=2) chisq.test(A,correct=F)Or save the results in a new dataset: A = matrix(rmultinom(1,100,c(1/4,1/4,1/4,1/4)),ncol=2) B = chisq.test(A,correct=F) names(B) B B$statLook at the output from each step to make sure you understand it! Now, let's do our experiment 1000 times. There are various ways to do this, here is a simple one:
X = rep(0,1000) for(i in 1:1000) { A = matrix(rmultinom(1,100,c(1/4,1/4,1/4,1/4)),ncol=2) B = chisq.test(A,correct=F) X[i] = B$stat }Now look at X; too see the 1000 numbers, just enter the name of the dataset, X. That isn't very useful, is it? How about a histogram: hist(X)or just compute the 95-th percentile: quantile(X,.95)Notes: The chisq.test() function will return a missing value (actually "Inf") whenever there is a row or column that is entirely zeros. That is because the expected count is zero in that case, and the chi-squared statistic is computed as sum((O-E)^2/E). In the simulation you can replace these missing values by some rather large number to include them in the compuation. Make sure choose a number large enough so that you won't confuse it with a finite chi-squared value that might actually arise. X[is.na(X)]= 1000
Homework!
|