RANDOM SAMPLING - Biostatistical Methods in Epidemiology

size does not seem to be an issue. However, a major consideration is that we need to know the variancesσ_i²prior to using the weighted least squares approach, and in practice this information is almost never available. Therefore it is usually necessary to estimate theσ_i²from study data, in which case the weights are random variables rather than constants. So instead of (1.21) and (1.22) we have instead

θˆ= 1 Wˆ

n i=1

wiθˆi (1.23)

and

var(θ)ˆ = 1

Wˆ (1.24)

wherewˆi =1/σˆ_i²andWˆ =_n

i=1wˆi. When theσ_i²are estimated from large samples the desirable properties of (1.21) and (1.22) described above carry over to (1.23) and (1.24), that is,θˆis asymptotically unbiased with minimum variance.

a of them are cases. The simple random sample estimate of the prevalence rate is ˆ

πsrs=a/r, which has the variance var(πˆsrs)=π(1−π)/r. 1.3.2 Stratified Random Sampling

Suppose that the prevalence rate increases with age. Simple random sampling en- sures that, on average, the sample will have the same age distribution as the population. However, in a given prevalence study it is possible for a particular age group to be underrepresented or even absent from a simple random sample. Stratified random sampling avoids this difficulty by permitting the investigator to specify the propor- tion of the total sample that will come from each age group (stratum). For stratified random sampling to be possible it is necessary to know in advance the number of individuals in the population in each stratum. For example, stratification by age could be based on a census list, provided information on age is available. Once the strata have been created, a simple random sample is drawn from each stratum, resulting in a stratified random sample.

Suppose there arenstrata. For theith stratum we make the following definitions:

N_i is the number of individuals in the population,πi is the prevalence rate,ri is the number of subjects in the simple random sample, anda_i is the number of cases among ther_isubjects(i =1,2, . . . ,n). LetN =_n

i=1N_i,a=_n

i=1a_i and r =

n i=1

ri. (1.25)

For a stratified random sample, along with theN_i, ther_i must also be known prior to data collection. We return shortly to the issue of how to determine theri, given an overall sample size ofr. For the moment we require only that theri satisfy the con- straint (1.25). Since a simple random sample is chosen in each stratum, an estimate ofπi isπˆi =ai/ri, which has the variance var(πˆi)=πi(1−πi)/ri. The stratified random sample estimate of the prevalence rate is

ˆ πstr =

n i=1

N_i N

πˆi (1.26)

which is seen to be a weighted average of theπˆi. SinceE(πˆi)=πi, it follows from (1.7) that

E(πˆstr)=ⁿ

i=1

N_i N

πi =π

and soπˆstris unbiased. Applying (1.8) to (1.26) gives var(πˆstr)=

n i=1

N_i N

2πi(1−πi) r_i

. (1.27)

We now consider the issue of determining theri. There are a number of approaches that can be followed, each of which places particular conditions on theri. For example, according to the method of optimal allocation, the ri are chosen so that var(πˆstr)is minimized. It can be shown that, based on this criterion,

r_i = Ni

√πi(1−πi) n

i=1Ni

√πi(1−πi)

r. (1.28)

As can be seen from (1.28), in order to determine ther_iit is necessary to know, or at least have reasonable estimates of, theπi. Since this is one of the purposes of the prevalence study, it is therefore necessary to rely on findings from earlier prevalence studies or, when such studies are not available, have access to informed opinion.

Stratified random sampling should be considered only if it is known, or at least strongly suspected, that theπi vary across strata. Suppose that, unknown to the investigator, theπi are all equal, so thatπi = π for alli. It follows from (1.28) that ri =(Ni/N)rand hence, from (1.27), that var(πˆstr)=π(1−π)/r. This means that the variance obtained by optimal allocation, which is the smallest variance possible under stratified random sampling, equals the variance that would have been obtained from simple random sampling. Consequently, when there is a possibility that theπi

are all equal, stratified random sampling should be avoided since the effort involved in stratification will not be rewarded by a reduction in variance.

Simple random sampling and stratified random sampling are conceptually and computationally straightforward. There are more complex methods of random sampling such as multistage sampling and cluster sampling. Furthermore, the various methods can be combined to produce even more elaborate sampling strategies. It will come as no surprise that as the method of sampling becomes more complicated so does the corresponding data analysis. In practice, most epidemiologic studies use rel- atively straightforward sampling procedures. Aside from prevalence studies, which may require complex sampling, the typical epidemiologic study is usually based on simple random sampling or perhaps stratified random sampling, but generally noth- ing more elaborate.

Most of the procedures in standard statistical packages, such as SAS (1987) and SPSS (1993), assume that data have been collected using simple random sampling or stratified random sampling. For more complicated sampling designs it is necessary to use a statistical package such as SUDAAN (Shah et al., 1996), which is specifically designed to analyze complex survey data. STATA (1999) is a statistical package that has capabilities similar to SAS and SPSS, but with the added feature of being able to analyze data collected using complex sampling. For the remainder of the book it will be assumed that data have been collected using simple random sampling unless stated otherwise.

C H A P T E R 2

Measurement Issues in Epidemiology

Unlike laboratory research where experimental conditions can usually be carefully controlled, epidemiologic studies must often contend with circumstances over which the investigator may have little influence. This reality has important implications for the manner in which epidemiologic data are collected, analyzed, and interpreted.

This chapter provides an overview of some of the measurement issues that are important in epidemiologic research, an appreciation of which provides a useful per- spective on the statistical methods to be discussed in later chapters. There are many references that can be consulted for additional material on measurement issues and study design in epidemiology; in particular, the reader is referred to Rothman and Greenland (1998).

Dalam dokumen Biostatistical Methods in Epidemiology (Halaman 37-40)