Problem 4

Here is the t.test output for testing the null hypothesis of no difference between heights of males and females at age two using the Berkeley guidence study dataset accessible through the index page for the math 141 lab notes. The expression "ht2~sex" is an instance of a "model formula", such as we will see when we do regression and analysis of variance. The response variable is one the left of the "~", the explanatory variable on the right. In the t.test function, it simply means "split the dataset into two subsets, one for each value of sex", and test the null hypothesis that the two subsets come from the same population.
> t.test(ht2~sex,data=Berkeley,var.equal=TRUE)

        Two Sample t-test

data:  ht2 by sex 
t = 1.0709, df = 56, p-value = 0.2888
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.8135485  2.6822985 
sample estimates:
  mean in group Male mean in group Female 
            88.40000             87.46563 

Suppose that there really is a difference of 1cm in the mean heights of males and females at age two. Estimate the sample size you would need to give power at least .50 for detecting this difference. How big a sample would be necessary to achieve power .8? .95? You must make some assumption about the standard deviation of height at age 2. Justify your choice. You may want to load the dataset and compute a standard deviation or two!

Repeat the sample size computations for power .5, .8, and .95, assuming that the true difference is 2cm.