The Central Limit Theorem
2.4 Inferences About the Differences in Means, Randomized Designs
2.4.3 Choice of Sample Size
Selection of an appropriate sample size is one of the most important parts of any experimental design problem. One way to do this is to consider the impact of sample size on the estimate of the difference in two means. From Equation 2.30 we know that the 100(1 – )% confidence interval on the difference in two means is a measure of the precision of estimation of the difference in the two means. The length of this interval is determined by
We consider the case where the sample sizes from the two populations are equal, so that n1 n2n. Then the length of the CI is determined by
t /2, n1n22Sp
n11n120.55 1 2 0.01
0.280.27 1 2 0.280.27
16.7617.04(2.101)0.284101 101
16.7617.04(2.101)0.284101 101 12
y1y2t /2,n1n22Sp
n11 n12y1y2t /2,n1n22Sp
n11 n12 1 2y1y2t /2,n1n22Sp
n11 n121P
y1y2t /2,n1n22Spn11 n12 12
P
t /2,n1n22 y1Spy2n11(1n122) t /2,n1n221Consequently the precision with which the difference in the two means is estimated depends on two quantities—Sp, over which we have no control, and , which we can control by choosing the sample size n. Figure 2.12 is a plot of versus nfor = 0.05. Notice that the curve descends rapidly as nincreases up to about n= 10 and less rapidly beyond that. Since Sp is relatively constant and isn’t going to change much for sample sizes beyond n 10 or 12, we can conclude that choosing a sample size of n 10 or 12 from each population in a two-sample 95% CI will result in a CI that results in about the best precision of estimation for the difference in the two means that is possible given the amount of inherent variability that is present in the two populations.
We can also use a hypothesis testing framework to determine sample size. The choice of sample size and the probability of type II error are closely connected. Suppose that we are testing the hypotheses
and that the means are notequal so that 12. Because H0:12is not true, we are concerned about wrongly failing to reject H0. The probability of type II error depends on the true difference in means . A graph of versus for a particular sample size is called the oper- ating characteristic curve, or O.C. curvefor the test. The error is also a function of sample size. Generally, for a given value of , the error decreases as the sample size increases. That is, a specified difference in means is easier to detect for larger sample sizes than for smaller ones.
An alternative to the OC curve is a power curve, which typically plots power or 1 ‚ versus sample size for a specified difference in the means. Some software packages perform power analysis and will plot power curves. A set of power curves constructed using JMP for the hypotheses
is shown in Figure 2.13 for the case where the two population variances and are unknown but equal (21222) and for a level of significance of 0.05. These power21 22
H1⬊1 Z 2
H0⬊12
H1⬊1 Z 2
H0⬊12
t /2, 2n22n t /2, 2n22n t /2, 2n22n t /2, 2n2Sp
2n2.4 Inferences About the Differences in Means, Randomized Designs
45
■ F I G U R E 2 . 1 2 Plot of t /2, 2n 22nversus sample size in each population nfor 0.05
t*sqrt (2/n)
4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5
0 5 10
n 15 20
curves also assume that the sample sizes from the two populations are equal and that the sam- ple size shown on the horizontal scale (say n) is the total sample size, so that the sample size in each population is n/2. Also notice that the difference in means is expressed as a ratio to the common standard deviation; that is
From examining these curves we observe the following:
1. The greater the difference in means , the higher the power (smaller type II error probability). That is, for a specified sample size and significance level , the test will detect large differences in means more easily than small ones.
2. As the sample size get larger, the power of the test gets larger (the type II error probability gets smaller) for a given difference in means and significance level . That is, to detect a specified difference in means we may make the test more pow- erful by increasing the sample size.
Operating curves and power curves are often helpful in selecting a sample size to use in an experiment. For example, consider the Portland cement mortar problem discussed previously.
Suppose that a difference in mean strength of 0.5 kgf/cm2has practical impact on the use of the mortar, so if the difference in means is at least this large, we would like to detect it with a high probability. Thus, because kgf/cm2is the “critical” difference in means that we wish to detect, we find that the power curve parameter would be . Unfortunately, involves the unknown standard deviation . However, suppose on the basis of past experience we think that it is very unlikely that the standard deviation will exceed 0.25 kgf/cm2. Then substituting kgf/cm2into the expression for results in . If we wish to reject the null hypothesis when the difference in means with prob- ability at least 0.95 (power = 0.95) with , then referring to Figure 2.13 we find that the required sample size on the horizontal axis is 16, approximately. This is the total sample size, so the sample size in each population should be
In our example, the experimenter actually used a sample size of 10. The experimenter could have elected to increase the sample size slightly to guard against the possibility that the prior estimate of the common standard deviation was too conservative and was likely to be some- what larger than 0.25.
Operating characteristic curves often play an important role in the choice of sample size in experimental design problems. Their use in this respect is discussed in subsequent chap- ters. For a discussion of the uses of operating characteristic curves for other simple compar- ative experiments similar to the two-sample t-test, see Montgomery and Runger (2011).
Many statistics software packages can also assist the experimenter in performing power and sample size calculations. The following boxed display illustrates several computations for the Portland cement mortar problem from the power and sample size routine for the two-sample t test in Minitab. The first section of output repeats the analysis performed with the OC curves; find the sample size necessary for detecting the critical difference in means of 0.5 kgf/cm2, assuming that the standard deviation of strength is 0.25 kgf/cm2. Notice that the answer obtained from Minitab,n1n28, is identical to the value obtained from the OC curve analysis. The second section of the output computes the power for the case where the critical difference in means is much smaller; only 0.25 kgf/cm2. The power has dropped con- siderably, from over 0.95 to 0.562. The final section determines the sample sizes that would be necessary to detect an actual difference in means of 0.25 kgf/cm2with a power of at least 0.9. The required sample size turns out to be considerably larger,n1n223.
n16/28.
0.05
120.5 2
0.25
120.5 0.5/
1 2
12
2.4 Inferences About the Differences in Means, Randomized Designs
47
Power and Sample Size 2-Sample t-Test
Testing mean 1mean 2 (versus not)
Calculating power for mean 1mean 2difference Alpha0.05 Sigma0.25
Sample Target Actual
Difference Size Power Power
0.5 8 0.9500 0.9602
Power and Sample Size 2-Sample t-Test
Testing mean 1mean 2 (versus not )
Calculating power for mean 1mean 2difference Alpha0.05 Sigma0.25
Sample
Difference Size Power
0.25 10 0.5620
Power and Sample Size 2-Sample t-Test
Testing mean 1mean 2 (versus not )
Calculating power for mean 1mean 2difference Alpha0.05 Sigma0.25
Sample Target Actual
Difference Size Power Power
0.25 23 0.9000 0.9125
■ F I G U R E 2 . 1 3 Power Curves (from JMP) for the Two-Sample t-Test Assuming Equal Varianes and 0.05. The Sample Size on the Horizontal Axis is the Total sample Size, so the sample Size in Each population is nsample size from graph/2.
1.00
0.75
δ = 2 δ = 1.5
δ = = 1
0.50
Power
0.25
0.00
10 20 30
Sample Size
40 50
|μ1–μ2| σ