EXACT CONDITIONAL METHODS FOR A SINGLE 2 × 2 TABLE The methods presented in the preceding section are computationally convenient but

4.2 EXACT CONDITIONAL METHODS FOR A SINGLE 2×2 TABLE

Hypergeometric Distribution

Once we have conditioned onm1, the random variables A1and A2 are no longer independent. Specifically, we have the constraintA1+A2=m1, and soA2is completely determined by A₁ (andvice versa). As a result of conditioning onm₁ we have gone from two independent binomial random variables to a single random variable corresponding to the index cell. We continue to denote the random variable in question byA₁, allowing the context to make clear which probability model is being considered. As shown in Appendix C, conditioning onm₁ results in a (noncentral) hypergeometric distribution. The probability function is

P(A1=a1|OR)= 1 C

a₁

m₁−a₁

OR^a¹ (4.13)

where

C= u

x=l

m1−x

OR^x.

Viewed as a hypergeometric random variable, A1 has the sample space {l,l +1, . . . ,u}, wherel =max(0,r1−m2)andu =min(r1,m1). Here max and min mean thatlis the maximum of 0 andr1−m2, anduis the minimum ofr1andm1. Since r1−m2=(r−r2)−(r−m1)=m1−r2,lis sometimes written as max(0,m1−r2). Evidently,l ≥ 0 andu ≤ r1, and so the hypergeometric sample space of A1 is contained in the binomial sample space. For a given set of marginal totals, the hypergeometric distribution is completely determined by the parameterOR. Therefore, by conditioning onm₁we have eliminated the nuisance parameterπ2. The numerator of (4.13) gives the distribution its basic shape, and the denominatorC ensures that (1.1) is satisfied. From (1.2) and (1.3), the hypergeometric mean and variance are

E(A₁|OR)= 1 C

u x=l

x r1

m1−x

OR^x (4.14)

and

var(A1|OR)= 1 C

u x=l

[x−E(A1|OR)]² r₁

r₂ m₁−x

OR^x. (4.15) Unfortunately, (4.13), (4.14), and (4.15) do not usually simplify to less complicated expressions. An instance where simplification does occur is whenOR= 1. In this case we say thatA1has a central hypergeometric distribution. For the central hypergeometric distribution,

P0(A1=a1)= _r₁

_r₂

m1−a1

= r₁!r₂!m₁!m₂!

a₁!(m₁−a₁)!(r₁−a₁)!(r₂−m₁+a₁)!r! (4.16)

e1=E0(A1)= r1m1

r (4.17)

and

v0=var0(A₁)= r1r2m1m2

r²(r−1). (4.18)

Sincem₁is now being treated as a constant,e₁andv0are the exact mean and variance rather than just estimates. However, for the sake of uniformity of notation, we will denote these quantities byeˆ₁andvˆ0in what follows. Observe that, other thanr!, the denominator of the final expression in (4.16) is the product of factorials defined in terms of the interior cells of Table 4.7. A convenient method of tabulating a central hypergeometric probability function is to form each of the possible 2×2 tables and calculate probability elements using (4.16).

Confidence Interval

Since the hypergeometric distribution involves the single parameterOR, the approach to exact interval estimation and hypothesis testing is a straightforward adaptation of the techniques described for the binomial distribution in Sections 3.1.1 and 3.1.2. An exact(1−α)×100% confidence interval forORis obtained by solving the equations

2 =P(A₁≥a₁|OR_c)= 1 C_c

u x=a₁

m1−x

(OR_c)^x

=1− 1 C_c

a1−1 x=l

m1−x

(OR_c)^x

and

2 =P(A₁≤a₁|OR_c)= 1 C_c

x=l

m1−x

(OR_c)^x

=1− 1 Cc

u x=a₁+1

r₁ x

r₂ m₁−x

(ORc)^x

forOR_c andOR_c, whereC_c andC_c stand forCwithOR_candOR_c substituted for OR, respectively.

Fisher’s Exact Test

It is possible to test hypotheses of the form H0:OR= OR0for an arbitrary choice ofOR0 but, in practice, interest is mainly in the hypothesis of no associationH0 : OR=1. The exact test of association based on the central hypergeometric distribution is referred to as Fisher’s (exact) test (Fisher, 1936; §21.02). The tail probabilities

are

P₀(A₁≥a₁)= u x=a₁

_r₁

_r₂

m1−x

m₁

=1−

a1−1 x=l

_r₁

_r₂

m1−x

m₁

and

P0(A1≤a1)=

a₁

x=l

_r₁

_r₂

m1−x

=1− u x=a₁+1

_r₁

_r₂

m1−x

Calculation of the two-sidedp-value using either the cumulative or doubling method follows precisely the steps described for the binomial distribution in Section 3.1.1.

Recall the discussion in Chapter 3 regarding the conservative nature of an exact test when the distribution is discrete. This conservatism, which is a feature of Fisher’s test, is more pronounced when the sample size is small. This is precisely the condi- tion under which an asymptotic test, such as Pearson’s test, becomes invalid. These issues have led to a protracted debate regarding the relative merits of these two tests when the sample size is small. Currently, Fisher’s test appears to be regarded more favorably (Yates, 1984; Little, 1989).

Example 4.3 (Hypothetical Data) Data from a hypothetical cohort study are given in Table 4.8. For these data,l =1 andu=3. Note that 0, which is an element of the binomial sample space of A1, cannot be an element of the hypergeometric sample space since that would force the lower right cell count to be−1.

The central hypergeometric probability function is given in Table 4.9. The mean and variance areeˆ₁=1.80 andvˆ0=.36.

The noncentral hypergeometric probability function corresponding to Table 4.8 is P(A₁=a₁|OR)= 1

C 3

2 3−a1

OR^a¹

where

C= 3 x=1

3 x

2 3−x

OR^x =3OR+6OR²+OR³.

TABLE 4.8 Observed Counts:

Hypothetical Cohort Study Disease Exposure

yes no

yes 2 1 3

no 1 1 2

3 2 5

TABLE 4.9 Central Hypergeometric Probability Function: Hypothetical Cohort Study

a₁ P₀(A₁=a₁)

1 3!2!3!2!

1!2!2!0!5! =.3

2 3!2!3!2!

2!1!1!1!5! =.6

3 3!2!3!2!

3!0!0!2!5! =.1

The exact conditional 95% confidence interval forORis[.013,234.5], which is obtained by solving the equations

.025= 3 x=2

P(A1=x|OR_c)= 6(OR_c)²+(OR_c)³ 3OR_c+6(OR_c)²+(OR_c)³ and

.025= 2 x=1

P(A1=x|ORc)= 3ORc+6(OR_c)² 3(ORc)+6(ORc)²+(ORc)³ forOR_candOR_c.

Example 4.4 (Antibody–Diarrhea) For the data in Table 4.3,l=3 andu =14.

The central hypergeometric distribution is given in Table 4.10.

The exact conditional 95% confidence interval forORis [1.05, 86.94] which is quite wide and just misses containing 1. The p-value for Fisher’s test based on the

TABLE 4.10 Central Hypergeometric Probability Function (%): Antibody–Diarrhea

a₁ P₀(A1=a₁) P₀(A1≤a₁) P₀(A≥a₁)

3 <.01 <.01 100

4 .03 .03 99.99

5 .44 .47 99.97

6 3.08 3.55 99.53

7 11.43 14.98 96.45

8 24.01 38.99 85.02

9 29.35 68.34 61.01

10 20.96 89.31 31.66

11 8.58 97.88 10.69

12 1.91 99.79 2.12

13 .21 99.99 .21

14 .01 100 .01

TABLE 4.11 Central Hypergeometric Probability Function (%): Receptor Level–Breast Cancer a₁ P₀(A₁=a₁) P₀(A₁≤a₁) P₀(A₁≥a₁)

... ... ... ...

3 <.01 <.01 100

4 .01 .02 99.99

5 .07 .09 99.98

... ... ... ...

11 9.91 23.13 86.78

12 12.88 36.01 76.87

13 14.54 50.55 63.99

14 14.33 64.88 49.45

15 12.37 77.25 35.12

16 9.39 86.64 22.75

... ... ... ...

22 .13 99.94 .19

23 .04 99.98 .06

24 .01 99.99 .02

... ... ... ...

cumulative method isP₀(A₁≥12)+P₀(A₁≤5)=.026, and based on the doubling method is 2×P₀(A₁ ≥12)=.042. For these data, there is a noticeable difference between the cumulative and doubling results, but in either case we infer that low antibody level is associated with an increased risk of diarrhea. A comparison of the preceding results with those of Example 4.1 illustrates that exact confidence intervals tend to be wider than asymptotic ones, and exact p-values are generally larger than their asymptotic counterparts.

Example 4.5 (Receptor Level–Breast Cancer) For Table 4.5(a),l =0 andu = 48. The central hypergeometric distribution is given, in part, in Table 4.11.

The exact conditional 95% confidence interval forORis [1.58, 7.07], and the p- value for Fisher’s test based on the cumulative method isP0(A1 ≥23)+P0(A1≤ 4) = .08%. The remark made in Example 4.4 about exact results being conservative holds here (except for Pearson’s test), as may be seen from a comparison with Example 4.2. However, when the sample size is large, the differences between exact and asymptotic findings are often of little practical importance, as is the case here.

4.3 ASYMPTOTIC CONDITIONAL METHODS

Dalam dokumen Biostatistical Methods in Epidemiology (Halaman 110-115)