There are various systems available and in use to rate the quality of scientific evidence2, 5, 12. The quality and grading of evidence will determine the strength of recommendations for policy and practice, and therefore the confidence that adherence to the recommendation will do more good than harm2. The American College of Cardiology together with the American Heart Association’s Task Force on Practice Guidelines13 promote a system of Classification of Recommendations (see Table IV) combined with three levels of evidence (see Table V) for clinical practice. Probably the most experienced
body to grade evidence from nutritional studies is the panel responsible for the global report on Food, Nutrition and the Prevention of Cancer. In grading the scientific evidence “as a basis for dietary recommendations designed to prevent cancer” the panel used uniform methods and terminology. In their first report the panel used the terms convincing, probable, possible and insufficient to summarise a body of evidence. The panel further rated the relative strengths of associations between diet and cancer in individual Table III. Chemical substances in foods and drinks to illustrate possible heterogeneity
in dietary interventions11.
Class Nutrient group Nutrient/substance
Nutrients
Macronutrients (Provide energy)
Protein 21 amino acids
*saccharides, starch, non-starch, polysaccharides (dietary fibre)
Carbohydrate
Fat glycerol, cholesterol, fatty acids (different chain lengths, saturation)#
Micronutrients
Water-soluble vitamins C and B-group Fat-soluble vitamins A, D, E and K
Minerals Ca, Fe, Zn, Na, Mg, K, I etc.
Trace elements Cu, Se, Mn etc.
Water Water Water
Non-nutrients
Phytochemicals (Thousands of different compounds)
Phytoestrogens
Genistein, daidzein Lignans
Indoles
Dithiothiones etc.
Phenolic acids Resveratrol Tannins etc.
Other non- nutrients
Pigments Alcohol Additives etc.
* Mono-, di-, oligo- and polysaccarides (sugars and starch); non-starch polysaccharides (dietary fibre)
# omega-3, -6 and -9 fatty acids (with chain lengths from 12 to 22)
studies as strong (relative risk or odds ratio >2.0 or <0.5 and statistically significant), moderate (relative risk or odds ratio >2.0 or <0.5 but not statistically significant, or else 1.5–2.0 or 0.5–0.75 and statistically significant) and weak (relative risk or odds ratio is 1.5–2.0 or 0.5–0.75 but not statistically significant)9.
In their more recent report9 the panel added grade 1 to convincing evidence, grade 2 to probable, grade 3 to possible and grade 4 to insufficient evidence. The panel described the requirement for evidence to fall into these grades as follows:
Grade 1 – Convincing evidence
A “convincing” relationship should be robust enough to be extremely unlikely to change over time, as new evidence accrues. The panel required the following characteristics:
Evidence from more than one study type (epidemiological, experimental or clinical
■
trial), with more than one prospective cohort study of sufficient duration;
Presence of a plausible biological gradient (“dose response”) in the association (such a
■
gradient need not be linear or even in the same direction across the range of exposure);
Evidence from experimental studies demonstrating one or more plausible mechanisms
■
actually operating in humans;
No significant qualitative heterogeneity between studies of different types, or in
■
different populations or regions that could not be reasonably explained;
Good quality studies to exclude with confidence the possibility that the observed
■
association results from residual confounding;
All reasonable alternate explanations for observed associations to be excluded (that is
■
the relationship found to be specific to the particular exposure and outcome).
Table IV. Classification of recommendations for evidence-based clinical treatment13
Level Description
Class I Conditions for which there is evidence and/or general agreement that a given procedure or treatment is useful and effective.
Class II Conditions for which there is conflicting evidence and/or a divergence of opinion about the usefulness/efficacy of a procedure or treatment.
Class IIa The weight of the evidence/opinion is in favour of usefulness/efficacy.
Class IIb The usefulness/efficacy is less well established by evidence/opinion.
Class III Conditions for which there is evidence and/or general agreement that the procedure/treatment is not useful/effective, and in some cases, may be harmful.
All other levels of evidence were defined according to the degree to which the evidence did not meet the above standards. It has not been possible to define precisely the specific deficits which might lead to one or another grading. Failure to achieve a higher grade might result from the accumulation of several small deficits against a number of standards, or from a major lack in one particular aspect of evidence. The definitions below give illustrations of the types of deficit that might lead to an association being judged less likely to be causal.
Grade 2 – Probable evidence
There must be evidence from more than one study type, but there may be only one
■
prospective cohort study;
There may be unexplained heterogeneity between study types;
■
There may be absence of one of a number of characteristics, e.g. a plausible dose
■
response relationship, or evidence of a plausible mechanism operating in humans;
There may simply not be a large enough body of good quality evidence to be able to
■
exclude the possibility of residual confounding or other alternate explanations.
Grade 3 – Possible evidence
There may be evidence from only one type of study, or from different study types but
■
no prospective data;
There is unexplained heterogeneity especially within but also between study types;
■
The quality of studies may be inadequate;
■
There may be absence of evidence for a plausible mechanism operating in humans, or
■
there may be evidence for a mechanism unsupported by observational data;
A dose response may not be observed;
■
The body of evidence is inadequate in size or quality to exclude confounding, or other
■
reasons for the observed association.
Grade 4 – Insufficient evidence
There is insufficient evidence from any study type to draw firm conclusions;
■
A plausible mechanism has not been demonstrated to operate in humans;
■
Features characteristic of a causal relationship (e.g. dose response, homogeneity,
■
specificity of association) are absent;
Confounding or other reasons for the observed association remain likely.
■
This system has been developed to evaluate evidence of relationships between dietary exposures and health outcomes. It seems largely suitable for judging the evidence or the role of nutrition in preventing and treating TB, HIV/AIDS.
Gray and Gray10 suggested a hierarchy of evidence with 10 subclasses.
Table V. Level of evidence (for recommendations)13
Level Description
A Data derived from multiple randomised clinical trials.
B Data derived from a single randomised trial, or non-randomised studies.
C Consensus opinion of experts (no research studies to support)
Atkins and co-workers5, part of the GRADE Working Group, concluded that all of these systems had important shortcomings. The GRADE Working Group2 proposed an alternative, simplified system for grading the quality of evidence, namely:
High:
■ further research is unlikely to change our confidence in the estimate of effect.
Moderate
■ : further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
■ Low: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low
■ : Any estimate of effect is very uncertain.