Systematic reviews and meta-analyses - Evidence in Mental Health Care

This has obvious advantages for clinical research and practice because it provides a method of minimising random error and producing more precise, and potentially more generalisable, results.

In this chapter we will discuss the benefits of conducting systematic reviews of the literature and the circumstances in which it is appropriate to perform a meta-analysis. Our view is that the results of systematic reviews, with or without meta-analyses, can reliably and efficiently provide the information needed for rational clinical decision making (Mulrow 1994). However, in our experience the results of a review are rarely unequivocal and require careful appraisal and interpretation. When using reviews, clinicians therefore need to integrate the results with clinical expertise and the patient’s preferences.

Systematic reviews

Why do reviews need to be systematic?

The literature is littered with unsystematic reviews, where the methods for identifying and appraising relevant studies are not explicit and it cannot be assumed that the methodology is adequate. The conclusions of such reviews must be viewed with suspicion as they may be misleading, though the extent to which they are unreliable is usually difficult to judge. In contrast, systematic reviews use explicit, and therefore reproducible, methods to limit bias and improve the reliability and accuracy of the conclusions (Mulrow 1994).

The first stage in conducting a systematic review is the formulation of a clear question. The nature of the question determines the type of research evidence to be reviewed and allows for the a priori specification of inclusion and exclusion criteria. For instance, to answer a question such as ‘in the treatment of depressive disorder, are dual action drugs more effective than selective serotonin reuptake inhibitors?’ the most reliable study design would be a randomised controlled trial (RCT). Randomisation avoids any systematic tendency to produce an unequal distribution of prognostic factors between the experimental and control treatments that could influence the outcome (Altman and Bland 1999). However, RCTs are not the most appropriate research design for all questions (Sackett and Wennberg 1997). A question such as ‘do obstetric complications predispose to schizophrenia?’ could not feasibly be answered by a RCT because it would neither be possible or ethical to randomise subjects to be exposed to obstetric complications. This is a form of aetiological question that would be best addressed by primary cohort and case controlled studies.

Likewise, a diagnostic question such as ‘how well can brief screening questions identify patients with depressive disorder?’ would be best answered by a cross-sectional study of patients at risk of being depressed. The rationale for systematic review is the same for all questions—the avoidance of random error and systematic bias. Systematic reviews have been useful in synthesising primary research results in both aetiology (see, for example, the review of the association between obstetric complications and schizophrenia by Geddes and Lawrie 1996) and diagnosis (see, for example, the review of case-finding instruments for depression by Mulrow and her colleagues 1995). Systematic reviews of these other study designs have their own methodological problems, and guidelines exist for undertaking reviews and meta-analyses of diagnostic tests (Irwig et al. 1994) and the observational epidemiological designs used in aetiological research (Stroup et al. 2000).

The Cochrane Collaboration

The recognition of the need for systematic reviews of RCTs, and the development of the scientific methodology of reviews has been one of the most striking developments in health services research over the

54 EVIDENCE IN MENTAL HEALTH CARE

last decade (Chalmers et al. 1992). The first UK Cochrane Centre was established in Oxford in 1992 as part of the information systems strategy developed to support the National Health Service Research and Development Programme; centres have since been established in several other countries. Within the Cochrane Collaboration, there are several collaborative review groups in areas of practice relevant to mental health clinicians. There are active collaborative review groups in the field of mental health including the Cochrane Schizophrenia Review Group and the Cochrane Depression, Anxiety and Neurosis Group.

Within psychiatry there are now over 100 Cochrane reviews.

The sources of bias in systematic reviews

Despite their potential to avoid bias, a number of factors can adversely affect the conclusions of a systematic review. When conducting a primary study it is important to ensure that the sample recruited is representative of the target population, otherwise the results may be misleading (selection bias). The most significant form of bias in systematic reviews is analogous to selection bias in primary studies, but applies to the selection of primary studies, rather than participants. There are various forms of selection bias including publication bias, language of publication bias and biases introduced by an over-reliance on electronic databases.

Publication bias

Publication bias is the tendency of investigators, reviewers and editors to differentially submit or accept manuscripts for publication based on the direction or strength of the study findings (Gottesman and Bertelsen 1989). The conclusions of systematic reviews can be significantly affected by publication bias.

The potential pitfalls of publication bias are obvious: if only studies that demonstrate a treatment benefit are published, the conclusions may be misleading if the true effect is neutral or even harmful. As early as 1959 it was noted that 97.3 per cent of articles published in four major journals had statistically significant results (Sterling 1959) although it is likely that many studies were conducted which produced non-significant results and these were less likely to be published.

Various strategies have been proposed to counter publication bias (Gilbody and Song 2000). These include methods aimed at detecting its presence and preventing its occurrence. It is generally accepted that prevention is likely to be the most effective strategy and it has been proposed that the most effective method would be to establish trial registries of all studies (Simes 1986). This would mean that a record of the trial or study would exist regardless of whether or not it was published and should reduce the risk of ‘negative’

studies disappearing. It should also facilitate the work of systematic reviewers. Although registries of ongoing research have been slow to become established—perhaps because it is not clear who should take a lead or fund them—some have been created, for example a registry of trials (see, for example, http://

www.controlled-trials.com/) and the register of studies performed under the auspices of the UK National Health Service (http://www.doh.gov.uk/research/nrr.htm/). While such registries may be useful prospectively, they do not solve the problem of retrospectively identifying primary studies. There are a number of methods for estimating the likelihood of the presence of publication bias in a sample of studies.

One commonly used way of investigating publication bias is the funnel plot (Egger et al. 1997). In a funnel plot, the study-specific odds ratios are plotted against a measure of the study’s precision (such as the inverse of the standard error or the number of cases in each study). There will be more variation in the results of small studies because of their greater susceptibility to random error and hence the results of the larger studies, with less random error, should cluster more closely around the ‘true value’. If publication bias is

not present, the graphical distribution of odds ratios should resemble an inverted funnel. If there is a gap in the region of the funnel where the results of small negative studies would be expected, then this would imply that the results of these studies are missing. This could be because of publication bias or it could mean that the search failed to find small negative studies. An example of a funnel plot from the meta-analysis of the association between obstetric complications and schizophrenia is shown in Figure 8.1. In this example it is clear that there is a gap in the distribution of odds ratios in the region of small negative studies. This would imply the presence of publication bias (Geddes and Lawrie 1996).

Strenuous attempts to avoid publication bias can cause other problems because it may require the incorporation of unpublished data which may not have been subject to peer review in systematic reviews and this has been controversial. Cook studied attitudes amongst meta-analysts and journal editors and found that meta-analysts were more inclined to use unpublished data than journal editors (Cook et al. 1993).

Indeed 30 per cent of editors questioned would not publish overviews that included unpublished data. The journal editors argued that unpublished data was more likely to be subject to fraud or distortion and was less scientifically rigorous. It is, however, becoming increasingly recognised that publication in a peer-review journal, even a prestigious one, is no guarantee of scientific quality (Oxman et al. 1991). Unpublished data, however, can be problematic as the data supplied may not be a full or representative sample (Cook et al.

1993). Reliance only on published data may also be problematic and, in many situations, it is useful to obtain the disaggregated original data—though this dramatically increases the size of the task of reviewing (Clarke and Stewart 1994; Geddes et al. 1999b).

Nowhere is the impact of publication bias more obvious than in a meta-analysis. Using the ‘trim and fill’

method to evaluate bias in funnel plots, Sutton et al. found that in 8 per cent of reviews in the Cochrane Database of Systematic Reviews, the statistical inferences regarding the effect of the intervention were changed (Sutton et al. 2000). The ‘trim and fill’ method adjusts for any asymmetry in a funnel plot in Figure 8.1 Funnel plot. The study-specific odds ratios have been plotted against a measure of the study’s power (in this example, the number of cases). The diamonds represent case-control studies and the circles cohort studies. There is an absence of small case-control studies with odds ratios close to (or less than) 1. By chance, such studies should exist and their absence implies a degree of publication bias (from Geddes and Lawrie, 1996).

56 EVIDENCE IN MENTAL HEALTH CARE

calculating the pooled estimate of effect. Although this method is controversial, the study does demonstrate the potential impact of publication bias on the estimate of effect size.

Language of publication bias (Tower of Babel bias)

Restricting a search to one language (for example searching only for English-language papers) can be hazardous. It has been shown that studies that find a treatment effect are more likely to be published in English-language journals, whilst opposing studies may be published in non-English-language journals (Egger and Smith 1998). Language of publication bias has also been called the ‘Tower of Babel’ bias (Gregoire et al. 1995).

Uncritical use of electronic databases

With the increasing availability of convenient electronic bibliographic databases, there is a danger that reviewers may rely on them unduly. This can cause bias because electronic databases do not offer comprehensive or unbiased coverage of the relevant primary literature. Adams investigated the adequacy of Medline searches for RCTs in mental health care and found that the optimal Medline search had a sensitivity of only 52 per cent (CI 48–56 per cent) (Adams et al. 1994). Sensitivity can be improved by searching other databases in addition to Medline, for example Embase, PsycLit, Psyndex, CINAHL and Lilacs.

To avoid the limitations of relying on electronic databases—or any other resource—for the identification of primary studies, reviews undertaken under the auspices of the Cochrane Collaboration seek to use optimally sensitive, over-inclusive searches to identify as many studies as possible with a combination of electronic searching, handsearching, reference checking and personal communications (Cochrane Collaboration 1995).

Meta-analysis

While a meta-analysis is not an essential part of a systematic review, a meta-analysis should only be performed in the context of a systematic review. The technique of meta-analysis has been controversial and this is perhaps because it is potentially so powerful, but it is also particularly susceptible to abuse. Here, we review a number of key issues in meta-analysis.

Increasing statistical power by pooling the results of individual studies

Many trials, especially in psychiatry, are small (Johnson 1983; Thornley and Adams 1998). There are many difficulties in recruiting patients to randomised trials, although some of these difficulties have been minimised in other areas of medicine by simplification of trial procedures, allowing large scale, definitive trials (Peto et al. 1993).

Although there is a need for larger randomised trials in psychiatry, it is also important to make the most efficient use of the trials that have been completed. Meta-analysis is a tool that can increase sample size, and consequently statistical power, by pooling the results of individual trials. Increasing the sample size also allows for more precision (i.e. a more precise estimate of the size of the effect with narrower confidence intervals).

Assessing variation between studies

Combining studies, however attractive, may not always be appropriate. As Eysenck and others have pointed out, the inappropriate pooling of disparate studies can be likened to combining oranges and apples—the result is meaningless (Eysenck 1994). There are two main approaches to dealing with variations in the primary studies. First, it is necessary to ensure that the individual studies are really looking at the same clinical or research question. Individual studies might vary with respect to study participants, intervention, duration of follow up and outcome measures. Such a decision will usually require a measure of judgement, and for this reason a reviewer should always pre-specify the main criteria for including primary studies in the review. In Cochrane reviews, a protocol is first developed, and the inclusion criteria, outcomes of interest and proposed methods of analysis are described. This is peer reviewed before the review can go ahead (Cochrane Collaboration 1995). Having decided that the primary studies are investigating the same, or a close enough, question, an important role of meta-analysis is to investigate variations between the results of individual studies (heterogeneity). When such variation exists, it is useful to estimate if more heterogeneity exists than can be reasonably explained by the play of chance alone. If so, attempts should be made to identify the reasons for such heterogeneity, and it may then be decided that it is not reasonable to combine the studies, or that it is, but that the overall pooled estimate needs to take the variation into account (Thompson 1994).

Investigating the impact of study quality

Individual studies vary in their methodological quality. Most has been written about the factors that impact on the quality of randomised trials. It has been shown that allocation, concealment and randomisation, blinding or masking and whether or not an intention-to-treat analysis was performed (i.e. all participants are considered and they are analysed in the groups to which they were randomly allocated), can all affect the direction of the results (Schulz et al. 1995). Studies that are deficient in any of these areas tend to overestimate the effect of the intervention. This presents a dilemma to the meta-analyst, who must determine to what extent the variations in methodological quality threaten the combinability of the data.

Many scales have been developed to assess the methodological quality of randomised trials. The scores on these quality assessment scales may be incorporated into the design of a meta-analysis with a ‘quality weight’ applied to the study-specific effect estimate. Although a huge number of quality scales are available, the current consensus is that their use is problematic because of uncertain validity. It has also been shown that the results of using different scales leads to substantial differences in the pooled estimate (Juni et al.

1999). The optimal approach at present is to assess qualitatively those aspects of trial design that affect its internal validity (allocation concealment, blinding, study attrition and method of analysis). Sensitivity analyses can then be conducted to investigate the effect of excluding poor quality trials.

Investigating patient subgroups

The main treatment effect of a trial gives an indication of the average response for an average patient meeting the inclusion criteria. Individual patients in real-life clinical practice deviate from the average to greater or lesser degrees. To tailor the results of a trial to an individual patient it is tempting to perform a sub-analysis of the trial participants with a specific characteristic or set of characteristics.

As we have already discussed many trials conducted in psychiatry are small. Further division of these trials into subgroups reduces the sample size even more and consequently the statistical power of the results.

Inevitably, estimates of the treatment in subgroups of patients are more susceptible to random error—and

58 EVIDENCE IN MENTAL HEALTH CARE

therefore imprecision, than the estimate of the average effect for all patients overall (Counsell et al. 1994).

Furthermore, unless the randomisation was initially stratified according to the important subgroups, the protection from confounding afforded by randomisation will not apply and any observed subgroup difference in treatment effect may be caused by confounding. Meta-analysis pools data from individual studies, with a consequent increase in power, and this may make subgroup analyses more reliable.

However, it should be emphasised that meta-analysis on its own cannot prevent systematic error in the analysis of subgroups. Subgroup analyses should therefore always be viewed cautiously. Clearly, dredging the results of a meta-analysis to identify post-hoc subgroups can still lead to erroneous results though random error.

Conclusion

Systematic reviews that seek to synthesise all relevant studies in response to a clearly defined clinical question, are an invaluable resource for both clinicians and researchers. Not all reviews, however, are systematic, and even those that are described as ‘systematic’ may be deficient methodologically. Strategies have been developed to reduce bias in systematic reviews, especially publication bias and the biases introduced through the over-reliance on electronic databases. A high quality systematic review, by definition, provides the best available evidence on a specific topic.

The usefulness of a systematic review can be further enhanced by the calculation of a statistical summary of the results by the techniques of meta-analysis. By pooling the results of the studies, the risk of random error is reduced, increasing both statistical power and allowing for a more accurate estimate of effect size. It must not be forgotten, however, that whilst pooling the results reduces random error, it is not eliminated.

Consequently, analyses are still susceptible to both type I and II errors. When critically appraising a meta-analysis it is not only necessary to assess whether statistical combination was appropriate, but the significance and robustness of the results must be investigated.

Chapter 9

Dalam dokumen Evidence in Mental Health Care (Halaman 68-75)