Choice of Sample Size - Inferences About the Differences in Means, Randomized Designs

The Central Limit Theorem

2.4 Inferences About the Differences in Means, Randomized Designs

2.4.3 Choice of Sample Size

Selection of an appropriate sample size is one of the most important parts of any experimental design problem. One way to do this is to consider the impact of sample size on the estimate of the difference in two means. From Equation 2.30 we know that the 100(1 – )% conﬁdence interval on the difference in two means is a measure of the precision of estimation of the difference in the two means. The length of this interval is determined by

We consider the case where the sample sizes from the two populations are equal, so that n₁ n₂n. Then the length of the CI is determined by

t_{/2, n}₁_n₂₂Sp

ⁿ¹¹ⁿ¹²

0.55 1 2 0.01

0.280.27 1 2 0.280.27

16.7617.04(2.101)0.28410¹ 10¹

16.7617.04(2.101)0.28410¹ 10¹ 12

y1y2t_/2,n₁_n₂₂Sp

ⁿ¹¹ ⁿ¹²

y1y2t_/2,n₁_n₂₂Sp

ⁿ¹¹ ⁿ¹² ¹ ²

y₁y₂t_/2,n₁_n₂₂S_p

ⁿ¹¹ ⁿ¹²

^y¹^y²^t^/2,n¹ⁿ²²^S^p

ⁿ¹¹ ⁿ¹² ¹²

^t^/2,n¹ⁿ²² ^y¹S_p^y

²ⁿ¹¹⁽¹ⁿ¹²²⁾ ^t^/2,n¹ⁿ²²

Consequently the precision with which the difference in the two means is estimated depends on two quantities—Sp, over which we have no control, and , which we can control by choosing the sample size n. Figure 2.12 is a plot of versus nfor = 0.05. Notice that the curve descends rapidly as nincreases up to about n= 10 and less rapidly beyond that. Since Sp is relatively constant and isn’t going to change much for sample sizes beyond n 10 or 12, we can conclude that choosing a sample size of n 10 or 12 from each population in a two-sample 95% CI will result in a CI that results in about the best precision of estimation for the difference in the two means that is possible given the amount of inherent variability that is present in the two populations.

We can also use a hypothesis testing framework to determine sample size. The choice of sample size and the probability of type II error are closely connected. Suppose that we are testing the hypotheses

and that the means are notequal so that 12. Because H0:12is not true, we are concerned about wrongly failing to reject H0. The probability of type II error depends on the true difference in means . A graph of versus for a particular sample size is called the oper- ating characteristic curve, or O.C. curvefor the test. The error is also a function of sample size. Generally, for a given value of , the error decreases as the sample size increases. That is, a speciﬁed difference in means is easier to detect for larger sample sizes than for smaller ones.

An alternative to the OC curve is a power curve, which typically plots power or 1 ‚ versus sample size for a speciﬁed difference in the means. Some software packages perform power analysis and will plot power curves. A set of power curves constructed using JMP for the hypotheses

is shown in Figure 2.13 for the case where the two population variances and are unknown but equal (²1²2²) and for a level of signiﬁcance of 0.05. These power²1 ²2

H₁⬊1 Z 2

H₀⬊12

H1⬊1 Z 2

H0⬊12

t_{/2, 2n}22n t_{/2, 2n}22n t_{/2, 2n}22n t_{/2, 2n}2Sp

²ⁿ

2.4 Inferences About the Differences in Means, Randomized Designs

45

■ F I G U R E 2 . 1 2 Plot of t_{/2, 2n}₂2nversus sample size in each population nfor 0.05

t*sqrt (2/n)

4.5 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5

0 5 10

n 15 20

curves also assume that the sample sizes from the two populations are equal and that the sample size shown on the horizontal scale (say n) is the total sample size, so that the sample size in each population is n/2. Also notice that the difference in means is expressed as a ratio to the common standard deviation; that is

From examining these curves we observe the following:

1. The greater the difference in means , the higher the power (smaller type II error probability). That is, for a speciﬁed sample size and signiﬁcance level , the test will detect large differences in means more easily than small ones.

2. As the sample size get larger, the power of the test gets larger (the type II error probability gets smaller) for a given difference in means and signiﬁcance level . That is, to detect a speciﬁed difference in means we may make the test more pow- erful by increasing the sample size.

Operating curves and power curves are often helpful in selecting a sample size to use in an experiment. For example, consider the Portland cement mortar problem discussed previously.

Suppose that a difference in mean strength of 0.5 kgf/cm²has practical impact on the use of the mortar, so if the difference in means is at least this large, we would like to detect it with a high probability. Thus, because kgf/cm²is the “critical” difference in means that we wish to detect, we ﬁnd that the power curve parameter would be . Unfortunately, involves the unknown standard deviation . However, suppose on the basis of past experience we think that it is very unlikely that the standard deviation will exceed 0.25 kgf/cm². Then substituting kgf/cm²into the expression for results in . If we wish to reject the null hypothesis when the difference in means with probability at least 0.95 (power = 0.95) with , then referring to Figure 2.13 we ﬁnd that the required sample size on the horizontal axis is 16, approximately. This is the total sample size, so the sample size in each population should be

In our example, the experimenter actually used a sample size of 10. The experimenter could have elected to increase the sample size slightly to guard against the possibility that the prior estimate of the common standard deviation was too conservative and was likely to be some- what larger than 0.25.

Operating characteristic curves often play an important role in the choice of sample size in experimental design problems. Their use in this respect is discussed in subsequent chap- ters. For a discussion of the uses of operating characteristic curves for other simple compar- ative experiments similar to the two-sample t-test, see Montgomery and Runger (2011).

Many statistics software packages can also assist the experimenter in performing power and sample size calculations. The following boxed display illustrates several computations for the Portland cement mortar problem from the power and sample size routine for the two-sample t test in Minitab. The first section of output repeats the analysis performed with the OC curves; find the sample size necessary for detecting the critical difference in means of 0.5 kgf/cm², assuming that the standard deviation of strength is 0.25 kgf/cm². Notice that the answer obtained from Minitab,n1n28, is identical to the value obtained from the OC curve analysis. The second section of the output computes the power for the case where the critical difference in means is much smaller; only 0.25 kgf/cm². The power has dropped considerably, from over 0.95 to 0.562. The final section determines the sample sizes that would be necessary to detect an actual difference in means of 0.25 kgf/cm²with a power of at least 0.9. The required sample size turns out to be considerably larger,n1n223.

n16/28.

0.05

120.5 2

0.25

120.5 0.5/

1 2

2.4 Inferences About the Differences in Means, Randomized Designs

47

Power and Sample Size 2-Sample t-Test

Testing mean 1mean 2 (versus not)

Calculating power for mean 1mean 2difference Alpha0.05 Sigma0.25

Sample Target Actual

Difference Size Power Power

0.5 8 0.9500 0.9602

Power and Sample Size 2-Sample t-Test

Testing mean 1mean 2 (versus not )

Calculating power for mean 1mean 2difference Alpha0.05 Sigma0.25

Sample

Difference Size Power

0.25 10 0.5620

Power and Sample Size 2-Sample t-Test

Testing mean 1mean 2 (versus not )

Calculating power for mean 1mean 2difference Alpha0.05 Sigma0.25

Sample Target Actual

Difference Size Power Power

0.25 23 0.9000 0.9125

■ F I G U R E 2 . 1 3 Power Curves (from JMP) for the Two-Sample t-Test Assuming Equal Varianes and 0.05. The Sample Size on the Horizontal Axis is the Total sample Size, so the sample Size in Each population is nsample size from graph/2.

1.00

0.75

δ = 2 δ = 1.5

δ = = 1

0.50

Power

0.25

0.00

10 20 30

Sample Size

40 50

|μ1–μ2| σ

Dalam dokumen Design and Analysis - of Experiments - Spada UNS (Halaman 60-64)