• Tidak ada hasil yang ditemukan

associated methods for inference concerning the population proportion can be extended for application to continuous random variables and parameters such as population means and totals. In this Chapter we present some of the formulae necessary to determine sample sizes for estimating and testing hypotheses about the population mean.

Statistical Methods for Sample Size Determination

Example 1.7.2

Consider the data in Example 1.7.1, but this time we will determine the sample size necessary to be 95% confident of estimating the average retail price of twenty tablets of the tranquilizer in the population of all pharmacies to within 5%

(not 10 cents) of the true value, if, based on the pilot survey data, we believe that the true price should be about $1.00.

Solution

Assuming cr2=(0.85)2, and using formula (21) above,

n = [(1.960)2(0.85)2]/[(0.05)2(1.00)2] = 1110.22.

Hence 1111 pharmacies should be sampled in order to be 95% confident that the resulting estimate will fall between $0.95 and $1.05 if the true average price is

$1.00.

37

In the above situations, our primary aim was estimation of the population mean. We now consider sample size determination when there is an underlying hypothesis which is to be tested.

Hypothesis testing - one population mean Suppose we would like to test the hypothesis versus the alternative hypothesis

and we would like to fix the level of the type I error to equal a and the type II error to

equal~· That is, we want the power of the test to equal 1-~. Without loss of generality, we will denote the actual 1.1 in the population as lla· Following the same development as was done with respect to hypothesis testing for the population proportion (with the additional assumption that the variance of xis equal to cr2/n under both H0 and Ha), the necessary sample size for this single-sample hypothesis testing situation is given by the formula:

n=

Example 1.7.3

A survey had indicated that the average weight of men over 55 years of age with newly diagnosed heart disease was 90 kg. However, it is suspected that the average weight of such men is now somewhat lower. How large a sample would be necessary to test, at the 5% level of significance with a power of 90%, whether the average weight is unchanged versus the alternative that it has decreased from 90 to 85 kg with an estimated standard deviation of 20 kg?

Solution

Using formula (22):

n = 202(1.645+ 1.282)2/(90-85)2 = 137.08.

Therefore, a sample of 138 men over 55 years of age with newly diagnosed heart disease would be required.

(22)

Note that, as was the case for population proportions, in order to calculate n, a, ~. !lo and

lla must be specified. A similar approach is followed when the alternative is two-sided.

That is, when we wish to test versus

In this situation, the null hypothesis is rejected if

x

is too large or too small. We assign area a/2 to each tail of the sampling distribution under H0 . The only adjustment to formula (22) is that z1-w2 is used in place of z 1-<X resulting in

n=

Example 1.7.4

A two-sided test of Example 1.7.3 could be designed to test the hypothesis that the average weight has not changed versus the alternative that the average weight has changed, and that a difference of 5 kg would be considered important.

Solution

Using formula {23) with z1_a/2 = 1.960, z1_~ = 1.282 and cr = 20, n = 202(1.960+ 1.282)2/(5)2 = 168.17.

Thus, 169 men would be required for the sample if the alternative were two- sided.

The two-sample problem

(23)

We now focus on estimating the difference between two population means and on testing hypotheses concerning the equality of means in two groups.

Estimating the difference between two means

The difference between two population means represents a new parameter, J..L1-J..L2 • An estimate of this parameter is given by the difference in the sample means,

x

1-

x

2• The mean of the sampling distribution of

xr x

2 is

and the variance of this distribution is

For simplicity, we will assume that cr12=cr22=cr2. Under this assumption we say the variances are said to be homoskedastic and the formula for the variance of the difference can be simplified to

The value cr2 is an unknown population parameter, which can be estimated from sample or pilot data by pooling the individual sample variances, s~ and s~, to form the pooled variance, s~, where

Statistical Methods for Sample Size Determination 39

s2 _ _ _ _ _ _ _ _ p -

where n1 and n2 are the sample sizes in the pilot study.

If, in addition, the same number of observations is selected from each of the two populations (n1=n2=n), then

Following the same logic used in estimating a single population mean, the quantity d denotes the distance, in either direction, from the population difference, fl d..lz, and may be expressed as

d = Z1-aQ-,I[2cr2/n].

Solving this expression for n it follows that n = - - - -

d2

Example 1.7.5

Nutritionists wish to estimate the difference in caloric intake at lunch between children in a school offering a hot school lunch program and children in a school which does not. From other nutrition studies, they estimate that the standard deviation in caloric intake among elementary school children is 75 calories, and they wish to make their estimate to within 20 calories of the true difference with 95% confidence.

Solution

Using formula (24) ,

n=(1.960)2[2(752)]/202=108.05.

Thus, 109 children from each school should be studied.

Hypothesis testing for two population means

(24)

Suppose a study is designed to test H0: fl1=fl2 versus Ha: fl1>fl2• The mean of the sampling distribution of X. 1-

x

2 under H0 is 0 and the variance is

Var(x 1- x 2) = 2cr2fn.

Now, suppose we would like to know how many observations to take in order to be 100( 1-a.)% confident of rejecting H0 when, in fact, the true difference between the population means is (f.l1-f.l2) = 8.

Following a strategy similar to that employed in developing formula (7), it follows that

n = - - - - - (25)

Since the value of cr2 would not ordinarily be known, it could be estimated from a pilot study using s~. The quantity 11 dlz represents that difference considered to be of sufficient practical significance to warrant detection.

Example 1.7.6

Suppose a study is being designed to measure the effect, on systolic blood pressure, of lowering sodium in the diet. From a pilot study it is obseNed that the standard deviation of systolic blood pressure in a community with a high sodium diet is 12 mmHg while that in a group with a low sodium diet is 10.3 mmHg. If a=0.05 and ~=0.1 0, how large a sample from each community should be selected if we want to be able to detect a 2 mmHg difference in blood pressure between the two communities?

Solution

Pooling the two variances, s~ = [ s~ + s~]/2 = [144.0+ 1 06.1]/2 = 125.05. {This computation assumes that the pilot study used equal sample sizes, otherwise a weighted average would be used.) This value is used in place of cr2 in formula (25) to test

versus

where, specifically, ll1-ll2 =2 is used as the alternative. This gives:

n = 2(125.05)[1.960+ 1.282]2/22 = 657.17.

Hence, a sample of 658 subjects would be needed in each of the two groups.

A similar approach is followed when the alternative is one-sided. That is, testing Ho:ll1- llz=O versus Ha:ll1-ll2>0. The sample size necessary in this situation is

Example 1.7.7

A study is being planned to test whether a dietary supplement for pregnant women will increase the birthweight of babies. One group of women will receive the new supplement and the other group will receive the usual nutrition consultation. From a pilot study, the standard deviation in birthweight is estimated at 500 g and is assumed to be the same for both groups. The hypothesis of no difference is to be tested at the 5% level of significance. It is desired to have 80% power (~=0.20) of detecting an increase of 100 g.

Solution

Using formula (26) it follows that

n = 2(500) 2[1.645+0.842]2/(1 00)2 = 309.26.

Hence, a sample of 310 subjects should be studied in each of the two groups.

(26)

Because of the wide range of possible parameter values, it is not possible to present a comprehensive set of sample size tables. Rather than provide a limited number of tables, only the formulae are presented.