6.2.1 Case–control studies
Case–control studies are one of the most common approaches to examining the genetic bases of a dis- order. It is very difficult in population-based studies to generate a sufficient number of cases of the disorder (D+) in a prospective study, when the disorder of interest is, as it often is, quite rare. A favoured alternative has long been to do a retro- spective case–control study, in whichN1subjects are sampled from among those who have already had onset of the disorder of interest, and N2 subjects from among those without that disorder.
In a case–control study, clearly genotype can be as reliably and accurately measured on the total sample after onset of the disorder as it might have been prospectively before the onset of the disor- der. Accurate measurement of environment, gene expression, or events prior to onset of the disorder, however, is very difficult. Memory is flawed, records are often incomplete, and, most important, recall is often coloured by the subject’s knowledge that they do or do not have the disorder in question at the time of assessment.
Moreover, Berkson’s Fallacy has been known since the 1950s [10, 11]: The samples of ‘cases’
and ‘controls’ may not be representative of the cases or the controls that would have resulted in prospective study of a population for a variety of reasons. What one sees in a case–control study may
well misrepresent what would have been seen in a prospective study of the same risk factors and outcomes in the same population.
In short, while the first studies to explore associa- tions between risk factors and outcomes may well be case–control studies, accepting inferences from such studies as ‘scientific truth’ is risky. Instead, such stud- ies should be used to generate strong hypotheses to be tested in subsequent prospective studies, and to yield the information necessary to design powerful and cost-effective such prospective studies. The type of study design necessary to understanding gene mod- eration or mediation corresponds to the third type of study design discussed by Fitzmaurice and Ravichan- dran (Chapter 2), in which a random number of subjects are sampled from a population and both binary characteristics are measured on each subject.
6.2.2 Statistical significance necessary but not sufficient
Concepts related to statistical hypothesis testing, such as ‘significance level’, ‘p-values’, ‘power’, came into common use in biomedical research about mid- twentieth century. In recent years, however, there is a growing realisation of the limitations of sta- tistical hypothesis testing [12–18]. One prominent epidemiological journal actually banned the use of
‘p-values’ for a time, while psychology [14, 19] and medicine [20–22] tried to deal with the problem by urging that every p-value be accompanied by a clinically interpretable effect size and its confi- dence interval.
In the way such testing is commonly done (testing null hypotheses of randomness), ‘statistically signif- icant’ generally means that the data are sufficient to demonstrate a non-random association, a comment on the data, not on the strength of association. Any non-null association, no matter how trivial, can be shown to be ‘statistically significant’ provided the sample size is large enough. The crucial issue is whether the strength of association is enough to warrant further interest in that association. In dealing with how risk factors ‘work together’ in the MacArthur approach, effect sizes are decomposed to understand the contributions of various risk factors. In essence, this approach assumes that, when there is rationale and justification for suspecting an
association between G and D, most, if not all, effect sizes are non-null, although many, perhaps most, may be trivial.
Certainly in all research into the risk factors (genetic or otherwise) for a disorder, it is necessary to establish statistical significance to warrant drawing any conclusions. It is not sufficient to stop there: a clinically interpretable effect size is necessary.
6.2.3 Odds ratio is not a clinically interpretable effect size
The most common effect size used in epidemiological and genetic research is the odds ratio. If the proba- bility of D+in the high risk group is Q1 and that in the low risk group isQ0, the odds of D+in the two groups areQ1/(1−Q1) andQ0/(1−Q0). The odds ratio (OR) is the ratio of those two odds.
The odds ratio was originally introduced as the likelihood ratio test statistic to test the null hypoth- esis of random differences between the two groups (Q1=Q0), and remains an excellent indicator of non-randomness [23]. OR=1 means thatQ1=Q0; OR=1 indicates non-random association. Use of the odds ratio to test for non-random association, for example using logistic regression analysis, is both common and recommended. However, there are a number of arguments against the odds ratio as an interpretable effect size [23–27] all converging on one conclusion: odds ratio should not be used as an effect size, but only as an indicator of non- randomness. Consider three questions: If not the odds ratio, then what? Why the odds ratio? Why not the odds ratio?
If not the odds ratio, then what? What has been shown is that, once one excludes the odds ratio from consideration, all the other common measures of 2×2 association correspond to one or another of the weighted kappa coefficients [25]. Which weighted kappa is appropriate in any given situation depends on the relative clinical importance of false positives to false negatives (which determines the weight in the weighted kappa). Thus among commonly used mea- sures of 2×2 association, odds ratio is an outlier.
For the purpose of the present discussion, one commonly used such weighted kappa will be used,
the RD=Q1−Q0, whereQ1 andQ0 are the inci- dence/prevalence of the disorder in the two groups compared. This is not to say that RD is the only appropriate choice, but this effect size is a reasonable choice in many clinical situations. Moreover RD is easily translated to ‘number needed to take’
(NNT=1/RD), an effect size easily interpretable for clinical or public policy decision-making [28–31].
Suppose one could magically transfer subjects in the high risk group (Q1) to the low risk group (Q0).
How many high risk patients would have to be transferred to hope to prevent one case of D+? The answer to that question is NNT=1/(Q1−Q0). If NNT=1, every high risk subject has the disorder and every low risk subject does not: As soon as one patient is transferred, one case may be prevented. If, on the other hand, one needs to transfer 3 or 30 or 3000 or even more high risk subjects to prevent one case, the clinical importance of the risk factor that defines ‘high risk’ becomes progressively weaker.
The choice between odds ratio and RD (or NNT) is of no concern when association is random. In that case OR=1, RD=0 and NNT is infinite. Also there is consistency when the probability of being in the high risk group equals the probability of having D+ in the population, for then
RD=(√
OR−1)/(√
OR+1)=1/NNT Thus, under this condition, OR=4 corresponds to RD=1/3 (NNT=3). Otherwise, NNT is always greater than (√
OR+1)/(√
OR−1) [31].
Thus OR=4 may correspond to NNT=3, or to NNT=30, 300, 3000,. . ., which makes interpretation for public health purposes impossible.
Why the odds ratio?If the magnitude of the odds ratio is so difficult to interpret for public health pur- poses, what arguments have been given supporting its use (other than as an indication of non-randomness)?
Epidemiologists often suggest that this is the statistic recommended by biostatisticians, and biostatisticians suggest that this is the statistic demanded by epidemi- ologists. If either claim is true, such recommendations in absence of a sound scientific basis are question- able. The most common reason given is ‘because this is what we’ve always used’ or ‘this is what everyone
uses’, that is that it is the most commonly used measure of 2×2 association (Section 6.3.1). This claim is true, but again, leaves the scientific basis for such common use unclear.
Another reason is that, unlike many measures of 2×2 association, the odds ratio is symmetric in the roles of Y and X (see Section 6.3.1), that reversing the roles of Y and X yields the same odds ratio. However, the weighted kappa, placing equal weight on false positives and false negatives, and the phi coefficient have the same property, but generally yield very different conclusions. Similarly, the fact that the odds ratio approximates another measure of association, the relative risk, but that claim is true only for a very low prevalence situation (a ‘rare disease’). In any 2×2 table there are four relative risks, and the odds ratio is always larger than the largest one.
Another, less often articulated reason is that the odds ratio is often big when most other effect sizes indicate a trivial effect, often stated as a claim that the odds ratio is more sensitive to deviations from ran- domness. As noted above, this is often true but leaves the question unanswered as to whether the odds ratio is conveying the right or wrong message. Often it is pointed out how easy odds ratio is to compute, most particularly that it can be estimated equally well with a prospective naturalistic or stratified sample or from an unbiased case–control sample [32]. However, if message conveyed by the odds ratio is wrong, ease of computation is not a valid scientific reason for its use.
In short, scientific support for the use of the odds ratio as an interpretable effect size seems to be lack- ing, although it must again be emphasised that it remains the index of choice in testing null hypothe- ses of randomness, and is very convenient for use in multivariate modelling, for example in the logistic regression model (Chapter 2).
Why not the odds ratio? The fundamental problem with the odds ratio lies in the fact that it is a ratio, very sensitive to the magnitude, and to the error of estimation, of a denominator that often approaches zero.
For example, suppose that underlying the categor- ical diagnoses D+ and D−, there is a dimensional diagnosis D [33], which is normally distributed with equal variances in the two groups, where a
categorical diagnosis is obtained by dichotomising D at some cut-point. The effect size differentiat- ing the two groups isδ=(μ1−μ0)/σ, withμ1 and μ0 the means of the dimensional diagnoses in the high and low risk groups, and σ their common standard deviation. Then where () is the stan- dard normal distribution function, and the cut-point c is measured in σ-units from the point halfway between the two means (μ1+μ2)/2, the odds ratio would be:
OR= [1−(c−δ)](c)
(c−δ)[1−(c)].
This odds ratio is shown in Figure 6.1 for various cut-pointsc and for various values of δ. Clearly if δ=0, OR=1, the null value, regardless of the cut-point c (and accordingly RD=0 and NNT is infinite). If δ=0, OR takes on its minimal value halfway between the two means at which point RD=(√
OR−1)/(√
OR+1). For cut-points above and below this midpoint, OR monotoni- cally increases to infinity (and RD monotonically decreases to 0). The crucial fact is that whenδ>0, one can get an odds ratio, as large as one can possibly desire, simply by dichotomising far enough in the tails of the distribution.
From another perspective, one of the most important research uses of an effect size is power computation in planning a hypothesis-testing study.
However, as is well known, power computations cannot be done using odds ratio as the effect size.
For example, in testing the simple hypothesis OR=1 versus OR=4 at the 5% level of significance, there is no sample size large enough to assure at least 80% power whenever OR=4. This is because OR=4 may mean that Q1=2/3 and Q0=1/3, in which case the necessary sample size per group would be 34, or it may mean thatQ1=0.004 and Q0=0.001, in which case the necessary sample size would be 4294 per group (and even larger for smaller values ofQ1 and Q0 having OR=4).
Generally to do power computations, users switch to other effect sizes such as RD. For example, OR=4 corresponds at best to RD=1/3. With 34 subjects per group, one has at least 80% power to detectany Q1andQ0pair with RD=1/3.
0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00
−4.0 −3.0 −2.0 −1.0 0.0 1.0 2.0 3.0 4.0 cutpoint
Odds ratio
= 0.0
= 0.2
= 0.4
= 0.6
= 0.8
= 1.0
Fig 6.1 Values of odds ratio obtained by dichotomising a hypothetical dimensional diagnosis having normal distributions with equal variance in the high risk and low risk subpopulation, where the effect size comparing those distributions isδ, the standardised mean difference between the two means. The cutpoints are measured in standard deviation units from the point halfway between the means of the two distributions.
There are many other such arguments that raise questions about the value of the odds ratio as an inter- pretable effect size, and few, other than those based on custom or convenience, supporting its use as such.
6.2.4 Correlation versus interaction
A common source of confusion is that between two risk factors being correlated and two risk factors interacting. In Table 6.1, G and E are correlated if q1=q2 and then G and E are correlated regardless of which outcome is being considered in that popu- lation. On the other hand, G and E interact in their effect on a specific outcome D if the effect size relat- ing E to D is different for those with G=1 than for those with G= 0, or equivalently if the effect size relating G to D is different for those with E=1 than for those with E=0, that is if
P11−P10−P01+P00=0.
G and E may interact with respect to one choice of D, but not with another. Thus ‘correlation’ refers to the relationship between two variables, ‘interaction’ to
the association of two variables (or more) to a third.
Interaction may exist with or without correlation;
correlation may exist with or without interaction.
That now sets the basis for consideration of mod- eration and mediation.