CHAPTER 13: Interpretation
13.1: Appraisal of a Single Study
alone?” and this issue is usually assessed by calculating the p-value.
This is the probability (assuming that there are no biases) that a test statistic as large as that actually observed would be found in a study if the null
hypothesis were true, i.e. that there was in reality no causal effect of
exposure. However, recent reviews have stressed the limitations of p-values and significance testing (Rothman, 1978;
Gardner and Altman, 1986; Poole, 1987; Pearce and Jackson, 1988).
Foremost among these is that
significance testing attempts to reach a decision on the basis of the data from a single study, whereas what is more important is the strength and precision of the effect estimate and whether the findings of a particular study are consistent with those of previous studies. These issues are better addressed by calculating confidence intervals rather than p-values (Gardner and Altman, 1986; Rothman and Greenland, 1998). Similarly, the
possibility that the lack of a statistically significant association could be due to lack of precision (lack of study power) is more appropriately addressed by
considering the confidence interval of the effect estimate rather than by making post hoc power calculations (Smith and Bates, 1992).
What are the likely strengths and directions of possible biases?
Systematic error is distinguished from random error in that it would be present even with an infinitely large study, whereas random error can be reduced by increasing the study size. Thus, systematic error, or "bias", occurs if there is a systematic difference between what the study is actually estimating and what it is intended to estimate. The types of bias (confounding, selection bias, information bias) have already been discussed in chapter 6. In the current context the key issue is that any
epidemiologic study will involve biases.
The problem is not to identify possible biases (these will almost always exist), but rather to ascertain what direction they are likely to be in, and how strong they are is likely to be.
Confounding
In assessing whether an observed
association could be due to confounding, the first consideration is whether all potential confounders have been appropriately controlled for or appropriately assessed (e.g. by collecting and using confounder information in a sample of study participants). If not, it is essential to assess the potential strength and direction of uncontrolled confounding.
In some areas of epidemiologic research, e.g. occupational and
environmental studies, the strength of uncontrolled confounding is often less than might be expected. For example, Axelson (1978) has shown that for plausible estimates of the smoking prevalence in occupational populations, confounding by smoking can rarely account for a relative risk of lung cancer of greater than 1.5. Similarly,
Siemiatycki et al (1988) have found that confounding by smoking is generally even weaker for internal comparisons in which exposed workers are compared with non-exposed workers in the same factory or industry). On the other hand, the potential for confounding can be severe in studies of lifestyle and related factors (e.g. diet, nutrition, exercise).
It is unreasonable to simply assume that a strong association could be due to confounding by unknown risk factors, since to be a strong confounder a factor must be a very strong risk factor as well as being strongly associated with
exposure. For example, if an
occupational study found a relative risk of 2.0 for lung cancer in exposed
workers, it is highly unlikely that this could be due to confounding by
smoking, and it would be unreasonable to dismiss the study findings merely because smoking information had not been available. On the other hand, small relative risks (e.g. those in the range of 0.7-1.5, as frequently occur in dietary studies) are not so difficult to explain by lack of measurement, or poor measurement and control, of
confounders.
Selection bias
Whereas confounding generally involves biases inherent in the source population, selection bias involves biases arising from the procedures by which the is study subjects are chosen from the source population. As with confounding, if it is not possible to directly control for selection bias, it still may be possible to assess its likely strength and direction. It is
unreasonable to dismiss the findings of a particular study because of possible selection bias, without at least
attempting to assess which direction the possible selection bias would have been in, and how strong it might have been.
Information bias
With regards to information bias, the key issue is whether misclassification is likely to have been differential or non-differential. In the latter case, the bias will usually be in a know direction, i.e. towards the null. If
misclassification has been differential, then it is important to attempt to assess what direction the bias is likely
to have been in. The important issue is not whether information bias could have occurred (this is almost always the case since there are almost always problems of misclassification of
exposure and/or disease) but rather the likely direction and strength of such bias. In particular, if a study has yielded a positive finding (i.e. an effect estimate markedly different from the null value) then it is not valid to dismiss it because of the possibility of non-differential misclassification, or differential misclassification that is likely (although not guaranteed) produce a bias towards the null.
Summary of Issues of Systematic Error
In summary, when assessing whether the findings of a particular study could be due to such biases, the important issue is not whether such biases are likely to have occurred (since they will almost always be present to some extent), but rather what their direction and strength is likely to be, and
whether they taken together could explain the observed association. In particular, epidemiological studies are often criticized on the grounds that observed associations could be due to uncontrolled confounding or errors in the classification of exposure or
disease. However, the likely strength is of uncontrolled confounding is
sometimes less than might be expected, and non-differential misclassification of exposure will usually (though not always) produce a tendency for false negative findings rather than false positive findings.