The sampling distribution of the odds ratio

74 Foundations of Sampling and Statistical Theory

given disease presence or absence. That is, define

P1 * = Prob{EID}, 1-P1 * = Prob{E ID}

P2*=Prob{EID}, 1-P2*=Prob{EID}.

It follows that and that

In terms of the notation of the 2x2 table in Table 11.4 the parameters may be estimated as follows:

P1 =alm1, 1-P1 =dm1 P2 = b/m2, 1-P2 = d/m2 P1* = a/n1, 1-p1* = b/n1 P2* = ctn2, 1-p2* = d/n2 Estimates of the relative risk and odds ratio may be defined as:

RR = P1/P2 = (a/m1)/(b/m2) = am2/bm1

Since this expression involves unknown population parameters, an estimate of this quantity may be obtained as

Var[ln(OR)] = 1/a + 1/b + 1/c + 1/d.

Fig. 18 shows the use of ln(OR) and the relationship between the confidence limits as defmed on the two scales.

ln(<1Ru}=ln(OR)+zSE[In(dR)]

t---:=_,...---

ln(OR) t---~r---

ln(cfRL )=ln(OR)· zSE[In(01:t)]

1---::::o..r

0.0+---::~~----:-L---....L..---''----t~OR

O~L OR

L.wJ

Fig. 18 Plot of confidence interval for ln(OR) versus confidence interval for OR

Fig. 18 shows that, whereas the confidence interval established for ln(OR) is symmetric, when the limits are transformed into the original scale the transformed interval is not symmetric about OR.

If a statistical test of the null hypothesis H₀:0R=1 versus the alternative Ha:OR#1 is to be performed, the usual chi-square (x2) test based on the 2x2 table may be used. If the resulting calculated value of

xz

is too large, the null hypothesis is rejected. (The previously described test of the equality of two proportions is equivalent to this chi square test based on the 2x2 table.)

To illustrate these concepts, consider the following example.

Example 11.7.1

Thirty patients with cancer have been identified along with 150 controls who were in the hospital at the same time but for other conditions. Twenty-four of the cancer patients smoke while 90 of the non-cancer patients smoke. The data layout is as follows:

Cancer status Yes No Total

Smoking

Yes 24 90 114

No 6 60 66

Total 30 150 180

76 Foundations of Sampling and Statistical Theory

For these data the estimated odds ratio is (24)(60)/[(90)(6)] = 2.67. To determine whether 2.67 is significantly different from 1, test the hypothesis H0:0R=1 versus Ha:OH;to1, by computing the usual chi-square test with one degree of freedom.

Here,

x

²=4.31 which is significant at the 5% level. [Note that the critical region is reject Ho if

x

²^>

x

²1-a(1 ).] Hence we reject H0.

A confidence interval estimate for the population odds ratio is obtained by first taking the log9 of the estimated odds ratio, utilizing the standard error of the ln(OR) to construct a confidence interval for ln(OR), and finally exponentiating to obtain a confidence interval for the odds ratio. In this example, the log odds is ln(2.67)=0.981. The estimated variance of the log odds is:

Var[ln(OR)]=1 /24+ 1 /90+ 1 /6+ 1/60 = 0.236.

The 95% confidence interval for ln(OR) is:

0.981 - 1.96'11'0.236 ~ ln(OR) ~ 0.981 + 1.96'11'0.236 0.029 ~ ln(OR) ~ 1.93

Converting to original units it follows that

e0.029 ~OR~ e1.93 1.03 ~OR ~ 6.9 .

Since 1 does not fall in this interval, we again see that we would reject H0. This may not always be the case since the chi-square test and the method for confidence interval estimation are based on different distributional assumptions:

(P1-P2) and ln(OR), respectively. The reader should see Fleiss17 for a complete discussion of the various methods for estimation and testing of the odds ratio.

The literature on estimation and hypothesis testing about odds ratios is very large and most attention has focused on the situation where n1, n2 , m1 and m2 (see Table II.4), are small.

Since the goal of this manual is sample size determination, and since these sample sizes will tend to be large, the methods based on ln(OR) will be appropriate.

The sampling distribution of RR will, for extremely large samples, be approximated by a normal distribution. However, for sample sizes typically employed in most epidemiologic studies, the sampling distribution of RR will often not be normal, with considerable skewness to the right (as was the case with OR). A logarithmic transformation is again employed which induces more symmetry into the sampling distribution, allowing use of the normal distribution to approximate the sampling distribution for smaller sample sizes.

Thus, as was the case with the odds ratio, confidence interval estimation of the RR is usually performed by first obtaining a confidence interval for ln(RR) and later exponentiating the confidence limits to obtain a confidence interval for the RR parameter.

Recall that the relative risk is estimated from a cohort study as

and it follows, from standard methods used to obtain the variance of a function of a random variable, that:

Var[ln(RR )] = Var[ln(p1)-ln(P2)] = Var[ln(p1)l + Var[ln(P2)] .

Now, since the variance of the loge of any proportion p based on n observations is Var[ln(p)]

=

(1/n )[(1-P)/P]

it follows that the variance may be estimated as

This expression for the estimate of the variance of the lo& of the relative risk is then used in precisely the same way as was the expression for the variance of the odds ratio for construction of confidence intervals. Again, the limits of the symmetric confidence interval for the ln(RR) must be exponentiated in order to obtain a confidence interval for the relative risk. This interval will not be symmetric, as was illustrated in Fig. 18 for OR.

Dalam dokumen PDF Stanley Lemeshow, David W Hosmer Jr, Janelle Klar, and Stephen K. Lwanga (Halaman 83-86)