ASYMPTOTIC METHODS FOR A SINGLE 2 × I TABLE

TABLE 4.13 Summary of Antibody–Diarrhea Results

Result AU EC AC CF

OR 7.71 — 7.17 —

[OR,OR] [1.28, 46.37] [1.05, 86.94] [1.24, 41.52]^a [1.39, 40.65]

Associationp-value .01^b .03^c .02 —

aExplicit bWald cCumulative

TABLE 4.14 Summary of Receptor Level–Breast Cancer Results

Result AU EC AC CF

OR 3.35 — 3.33 —

[OR,OR] [1.68, 6.70] [1.58, 7.07] [1.67, 6.63]^a [1.69, 6.67]

Associationp-value .001^b <.001 <.001 —

aExplicit bWald

exposure is dichotomous. In this section we describe asymptotic unconditional and asymptotic conditional methods for the analysis 2×I tables, whereI ≥2.

The manner in which exposure categories are defined in a given study depends on a number of considerations—in particular, whether the exposure variable is continuous, discrete, or ordinal. An ordinal variable is one that is qualitative and where there is an implicit ordering of categories. For example, arthritis pain might be rated as mild, moderate, or severe. Stage of breast cancer is also ordinal, even though integers are used to designate the different stages. Discrete and ordinal variables are automatically in categorized form. In certain settings it may be reasonable to regard a discrete variable with many categories as continuous. For example, the number of cigarettes smoked per day is, strictly speaking, discrete, but in many applications it would be treated as a continuous variable.

When the exposure variable is continuous, categories can be created by selecting cutpoints to partition the range of exposures. To the extent possible, it is desirable to have categories that are consistent with the published literature. For instance, in Example 4.2, the continuous variable receptor level was dichotomized using a con- ventional cutpoint. The sample size of the study and the distribution of the exposure variable in the data also have implications for the choice of cutpoints, and hence for the number and width of categories. In particular, if a predetermined set of cutpoints results in categories that have few or even no subjects, it may be necessary to collapse over categories so as to avoid sparse data problems. When categories are created, it is implicitly assumed that, within each category, the association between exposure and disease is relatively uniform. This assumption may be violated when the categories are made too wide. It sometimes happens that neither substantive knowledge nor study data suggest a method of creating categories, making the choice of cutpoints somewhat arbitrary. In this situation, one option is to use percentiles as cutpoints. For example, quartiles can be formed using the 25th, 50th, and 75th percentiles. This results in four ordered categories consisting of the same (or nearly the same) numbers of subjects.

The data layout for the case ofI ≥2 exposure categories is given in Table 4.15. It is usual to order the categories from low to high exposure so thati =1 corresponds to the lowest exposure. Thus the orientation of categories in Table 4.15 is the opposite of the 2×2 case. We model theith exposure category using the binomial distribution with parameters(πi,r_i) (i=1,2, . . . ,I). The odds for theith exposure category is

TABLE 4.15 Observed Counts: Closed Cohort Study Disease Exposure category

1 2 · · · i · · · I yes a₁ a₂ · · · a_i · · · a_I m₁

no b₁ b₂ · · · b_i · · · b_I m₂ r₁ r₂ · · · r_i · · · r_I r

ωi =πi/(1−πi). Withi =1 as the reference category, the odds ratio is OR_i = πi(1−π1)

π1(1−πi).

Point Estimates, Confidence Intervals, and Pearson and Mantel–Haenszel Tests of Association

The unconditional maximum likelihood estimates ofωiandOR_i areωˆi =a_i/b_i and OR_ui = aib1

a1b_i

where we note thatORu1=1. A confidence interval forOR_i can be estimated using (4.7). We say there is no association between exposure and disease ifπ1 = π2 =

· · · =πI. The expected counts for theith exposure category are ˆ

ei =r_im₁

r and fˆi =r_im₂ r .

It is readily verified thateˆ_• =a_•=m1. It is possible to test each pair of categories for association using any of the tests for 2×2 tables described above. This involves _I

=I(I−1)/2 separate tests and, ifIis at all large, several of the tests may provide evidence for association even when it is absent, purely on the basis of chance (type I error). For example, withI =10 there would be 45 hypothesis tests. Withα=.05= 1/20, even if there is no association between exposure and disease, on average, at least two of the 45 tests would provide evidence in favor of association. This is an example of the problem of multiple comparisons, an issue that has received quite a lot of attention in the epidemiologic literature (Rothman and Greenland, 1998). An approach that avoids this difficulty is to perform tests of association which consider allI exposure categories simultaneously, as we now describe.

The Pearson test of association for a 2×Itable is X²_p=^I

i=1

(a_i− ˆe_i)² ˆ

e_i +(b_i− ˆf_i)² fˆ_i

(df=I −1). (4.35)

Note that there areI−1 degrees of freedom. Using earlier arguments it can be shown that

X_p²= r

I i=1

(ai− ˆei)² ˆ

ei (df=I−1). (4.36)

Conditioning on the total number of casesm1results in the multidimensional hyper- geometric distribution (Appendix E). The Mantel–Haenszel test of association for a 2×I table is

X²_mh= r−1

I i=1

(a_i − ˆe_i)² ˆ

e_i (df=I−1). (4.37)

Observe that (4.35), (4.36), and (4.37) are generalizations of (4.8), (4.11), and (4.27), respectively. From (4.36) and (4.37), we have

X²_mh= r−1

X_p²

just as in the dichotomous case. When the null hypothesis is rejected by either of the above tests, the interpretation is that overall there is evidence for an association between exposure and disease. This does not mean that each of the pairwise tests necessarily has a small p-value. Indeed, it is possible for the pairwise tests to indi- vidually provide little evidence for association and yet for the simultaneous test to indicate that an association is present.

Test for Linear Trend

The Pearson and Mantel–Haenszel tests of association are designed to detect whether the probability of disease differs across exposure categories. These are rather non- specific tests in that they fail to take into account patterns that may exist in the data.

We now describe a test designed to detect linear trend. In order to apply this test, it is necessary to assign an exposure level (dose, score) to each category. For a continuous exposure variable, a reasonable approach is to define the exposure level for each category to be the midpoint of the corresponding cutpoints. As an illustration, for age groups 65–69, 70–74, and 75–79, the midpoints are 67.5, 72.5, and 77.5. A problem arises when there is an open-ended category since, in this case, the midpoint is undefined. For example, there is no obvious way of defining a midpoint for an age group such as 80+. An alternative that avoids this problem is to define the exposure level for each category to be the mean or median exposure based on study data.

When the exposure variable is ordinal, the assignment of exposure levels is more complicated. For example, in the breast cancer study described in Example 4.2, there are three stages of disease: Stage I is less serious than stage II, which in turn is less serious than stage III. However, it is not clear how exposure levels should be assigned. In a case like this, it is usual to simply define the exposure levels to be the consecutive integers 1, 2, and 3. Defining exposure levels in this way implicitly assumes that the “distance” between stage I and stage II is the same as that between stage II and stage III. An assumption such as this ultimately depends on some notion of “severity” of disease, and therefore needs to be justified.

Letsi be the exposure level for theith category withs1 <s2 <· · · <sI. The ωi are unknown parameters, but we can imagine the scatter plot of log(ωˆi)against si (i = 1,2, . . . ,I). Let log(ωˆi) = ˆα+ ˆβsi be the “best-fitting” straight line for these points, whereαandβare constants. We are interested in testing the hypothesis H0:β=0. Whenβ=0 we say there is a linear trend in the log-odds, in which case the best-fitting straight line has a nonzero slope. As shown in Appendix E, the score

test ofH0:β =0, which will be referred to as the test for linear trend (in log-odds), is

X_t²= r−1

i=1s_i(a_i− ˆe_i)2

i=1s_i²eˆ_i−I i=1s_ieˆ_i

2 ˆ e_•

(df=1) (4.38)

(Cochran, 1954; Armitage, 1955). Large values ofX_t²provide evidence in favor of a linear trend. AlthoughX²_t has been presented in terms of log-odds, it can be inter- preted as a test for linear trend in probabilities, odds, or odds ratios. Accordingly, we can examine study data for the presence of linear trend using any of the corresponding category-specific parameter estimates.

It is important to appreciate that if H0is rejected—that is, if it is decided that a linear trend is present—it does not follow that the log-odds is a linear function of exposure (Rothman, 1986, p. 347; Maclure and Greenland, 1992). Instead the much more limited inference can be drawn that the “linear component” of the functional relationship relating log-odds to exposure has a nonzero slope. In many applications, especially when toxic exposures are being considered, it is reasonable to assume that, as exposure increases, there will be a corresponding increase in the risk of disease.

However, more complicated risk relationships are possible. For example, the risk of having a stroke is elevated when blood pressure is either too high or too low. Conse- quently the functional relationship between blood pressure and stroke has something of aJ-shape. The best-fitting straight line to such a curve has a positive slope and so the hypothesis of no linear trend would be rejected, even though the underlying functional relationship is far from linear.

Example 4.11 (Stage–Breast Cancer) Table 4.16 gives the observed counts for the breast cancer data introduced in Example 4.2, but now with stage of disease as the exposure variable.

A useful place to begin the analysis is to compare stages II and III to stage I using 2×2 methods. Table 4.17 gives odds ratio estimates and 95% confidence intervals, with stage I as the reference category. As can be seen, there is an increasing trend in odds ratios across stages I, II, and III (whereORu1=1).

The expected counts, given in Table 4.18, are all greater than 5. The Pearson and Mantel–Haenszel tests areX_p² =38.55(p < .001)andX_mh² =38.35(p < .001),

TABLE 4.16 Observed Counts:

Stage–Breast Cancer Survival Stage

I II III

dead 7 26 21 54

alive 60 70 8 138

67 96 29 192

TABLE 4.17 Odds Ratio Estimates and 95% Confidence Intervals: Stage–Breast Cancer

Stage OR_ui OR_ui OR_ui

II 3.18 1.29 7.85

III 22.50 7.27 69.62

TABLE 4.18 Expected Counts:

Stage–Breast Cancer

Survival Stage

1 2 3

dead 18.84 27.00 8.16 54 alive 48.16 69.00 20.84 138

67 96 29 192

both of which provide considerable evidence for an association between stage of disease and breast cancer mortality. Settings₁ =1,s₂ =2, ands₃ =3, the test for linear trend is

X_t²=

192−1 138

(24.69)² 200.25−(97.31)²/54

=33.90(p< .001)

which is consistent with the observation made above. For the sake of illustration, suppose that the “severity” of stage III compared to stage II is regarded as three times the “severity” of stage II compared to stage I. For example, this determination might be based on an assessment of quality of life or projected mortality. Withs₁ = 1, s₂=2, ands₃=5, the test for linear trend isX_t²=38.32(p< .001), a finding that is close to the earlier result.

C H A P T E R 5

Odds Ratio Methods for Stratified Closed Cohort Data

In most epidemiologic studies it is necessary to consider confounding and effect modification, and usually this involves some form of stratified analysis. In this chapter we discuss odds ratio methods for closed cohort studies in which there is stratifi- cation. The asymptotic unconditional and asymptotic conditional methods presented here are generalizations of those given in Chapter 4. Exact conditional methods are not discussed because they involve especially detailed computations. Appendix B gives the derivations of many of the asymptotic unconditional formulas that appear in this chapter and in Chapters 6 and 7.

5.1 ASYMPTOTIC UNCONDITIONAL METHODS

Dalam dokumen Biostatistical Methods in Epidemiology (Halaman 121-127)