Odds and Log-Odds Transformations Point Estimate

3.2 ASYMPTOTIC METHODS

3.2.2 Odds and Log-Odds Transformations Point Estimate

Recall from Section 2.2.2 that forπ =1 the odds is defined to beω=π/(1−π). For 0 < π < 1, we define the log-odds to be log(ω) = log[π/(1−π)]. In this book the only logarithm considered is the logarithm to the basee. The maximum likelihood estimates ofωand log(ω)are

ˆ ω= πˆ

1− ˆπ = a

r−a (3.10)

and

log(ω)ˆ =log πˆ

1− ˆπ

=log a

r−a

. (3.11)

If eitheraorr−aequals 0, we replace (3.10) and (3.11) with ˆ

ω= a+.5 r−a+.5 and

log(ω)ˆ =log

a+.5 r−a+.5

Haldane (1955) and Anscombe (1956) showed that log(ω)ˆ is less biased when .5 is added toa andr−a, whether they are 0 or not. This practice does not appear to be in widespread use and so it will not be followed here.

Figures 1.1(b)–1.5(b) and Figures 1.1(c)–1.5(c) show graphs of the distributions ofωˆ and log(ω), respectively, corresponding to the binomial distributions in Figuresˆ

1.1(a)–1.5(a). Evidently,ωˆ can be highly skewed, especially whenris small. On the other hand, log(ω)ˆ is relatively symmetric, but no more so than the untransformed distribution. On the basis of these findings, there seems to be little incentive to consider either the odds or log-odds transformations in preference to the untransformed distribution when analyzing single sample binomial data. As will be demonstrated in Chapter 4, the log-odds ratio transformation has an important role to play when analyzing data using odds ratio methods.

Confidence Interval

The maximum likelihood estimate of var[log(ω)]ˆ is ˆ

var[log(ω)] =ˆ 1 ˆ

π(1− ˆπ)r = 1 a + 1

r−a. (3.12)

If eitheraorr−aequals 0, we replace (3.12) with ˆ

var[log(ω)] =ˆ 1

a+.5 + 1 r−a+.5.

Gart and Zweifel (1967) showed thatvarˆ [log(ω)]ˆ is less biased when .5 is added to a andr−a, whether they are 0 or not. Similar to the situation with log(ω), thisˆ convention does not appear to be widely accepted and so it will not be adopted in this book. A(1−α)×100% confidence interval for log(ω)is

[log(ω),log(ω)] =

log πˆ

1− ˆπ

± z_α/₂

π(1ˆ − ˆπ)r. (3.13)

To obtain[π, π]we first exponentiate (3.13) to get[ω, ω], and then use

π= ω

1+ω (3.14)

and

π= ω

1+ω (3.15)

to determineπandπ. Since the exponential function is nonnegative, it follows from (3.13) thatω and ωare always nonnegative, and hence that π andπ are always between 0 and 1.

Hypothesis Test

Under the null hypothesisH0 : π = π0, the maximum likelihood estimates of the mean and variance of log(ω)ˆ areE₀[log(ω)] =ˆ log[π0/(1−π0)]and var0[log(ω)] =ˆ

1/[π0(1−π0)r]. A test ofH0is X²=

log

πˆ 1− ˆπ

−log π0

1−π0

[π0(1−π0)r] (df=1). (3.16) Example 3.3 Leta =10 andr =50. From (3.8) and (3.13), 95% confidence intervals are

[π, π] =.2±1.96 .2(.8)

50 = [.089, .311] and

[log(ω),log(ω)] =log .2

± 1.96

√.2(.8)(50) = [−2.08,−.693]. (3.17) Exponentiating (3.17) results in[ω, ω] = [.125, .500], and applying (3.14) and (3.15) gives[π, π] = [.111, .333].

An approach to determining whether a method of estimation is likely to produce satisfactory results is to perform a simulation study, also called a Monte-Carlo study.

This proceeds by programming a random number generator to create a large number of replicates of a hypothetical study. From these “data,” results based on different methods of estimation are compared to quantities that were used to program the random number generator. In most simulation studies, exact methods tend to perform better than asymptotic methods, especially when the sample size in each replicate is small. Consequently it is useful to compare asymptotic and exact estimates as in the following examples, with exact results used as the benchmark.

Example 3.4 Table 3.2 gives 95% confidence intervals for π, where, in each case,πˆ =.2. Whena =10, the implicit and log-odds methods perform quite well compared to the exact approach. To a lesser extent this is true fora =2 anda =5.

The explicit method does not compare as favorably, especially fora =2, where the lower bound is a negative number.

TABLE 3.2 95% Confidence Intervals (%) forπ

a=2 a=5 a=10

r=10 r=25 r=50

Method π π π π π π

Exact 2.52 55.61 6.83 40.70 10.03 33.72

Implicit (3.6, 3.7) 5.67 50.98 8.86 39.13 11.24 33.04

Explicit (3.8) −4.79 44.79 4.32 35.68 8.91 31.09

Log-odds (3.13) 5.04 54.07 8.58 39.98 11.11 33.33

TABLE 3.3 p-Values for Hypothesis Tests ofH₀:π =.4 a=2 a=5 a=10

Method r=10 r =25 r=50

Exact (cumulative) .334 .043 .004

No-transformation (3.9) .197 .041 .004

Log-odds (3.16) .129 .016 <.001

Example 3.5 Table 3.3 gives p-values for hypothesis tests of H₀ : π = .4, where, in each case,πˆ =.2. With the exact p-value based on the cumulative method as the benchmark, the no-transformation method performs somewhat better than the log-odds approach.

Example 3.6 In Chapter 4, data are presented from a closed cohort study in which 192 female breast cancer patients were followed for up to 5 years with death from breast cancer as the endpoint of interest. There werea = 54 deaths and so

π =54/192 =.281. The 95% confidence interval based on the implicit method is [.222, .349].

C H A P T E R 4

Odds Ratio Methods for Unstratified Closed Cohort Data

In Chapter 2 we compared the measurement properties of the odds ratio, risk ratio and risk difference. None of these measures of effect was found to be superior to the other two in every respect. In this chapter we discuss odds ratio methods for analyzing data from a closed cohort study (Section 2.2.1). The reason for giving precedence to the odds ratio is that there is a wider range of statistical techniques available for this measure of effect than for either the risk ratio or risk difference.

Thus the initial focus on the odds ratio reflects an organizational approach and is not meant to imply that the odds ratio is somehow “better” than the risk ratio or risk difference for analyzing closed cohort data. However, compared to the risk ratio and risk difference, it is true that methods based on the odds ratio are more readily applied to other epidemiologic study designs. As shown in Chapter 9, odds ratio methods for closed cohort studies can be used to analyze censored survival data; and, as discussed in Chapter 11, these same techniques can be adapted to the case-control setting.

For the most part, the material in this chapter has been organized according to whether methods are exact or asymptotic on the one hand, and unconditional or conditional on the other. This produces four broad categories: exact unconditional, asymptotic unconditional, exact conditional, and asymptotic conditional. Not all odds ratio methods fit neatly into this scheme, but the classification is useful. Within each of the categories we focus primarily on three topics: point estimation, (confidence) interval estimation, and hypothesis testing. For certain categories some of these topics will not be covered because the corresponding methods are not in wide use or their exposition requires a level of mathematical sophistication beyond the scope of this book. Exact unconditional methods will not be considered at all for several reasons: They can be intensely computational; they offer few, if any, advan- tages over the other techniques to be described; and they are rarely, if ever, used in practice. In Sections 4.1–4.5 we discuss odds ratio methods for tables in which the exposure is dichotomous, and in Section 4.6 we consider the case of a polychoto- mous exposure variable. General references for this chapter and the next are Breslow and Day (1980), Fleiss (1981), Sahai and Khurshid (1996), and Lachin (2000).

89 ISBN: 0-471-36914-4

4.1 ASYMPTOTIC UNCONDITIONAL METHODS

Dalam dokumen Biostatistical Methods in Epidemiology (Halaman 94-99)