Abstract

Facing an enormous influx of information from medical research, clinicians need to differentiate robust study findings from spurious ones and to decide which results they can use with high confidence and which they should be more skeptical about. Epidemiology provides guidelines for critical appraisal of the literature.1Guyatt G. Rennie D. Meade M.O. Cook D.J. Users' Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. McGraw Hill, New York, New York2008Google Scholar, 2Wang J.J. Wong T.Y. Epidemiology and clinical research.in: Albert D.M. Miller J.W. Azar D. Blodi B. Albert and Jakobiec's Principles and Practice of Ophthalmology. Third edition. Saunders, Edinburgh2008: 379-388Crossref Google Scholar This series aims to equip clinicians with the basic skills to analyze scientific evidence from the literature and wisely use high-quality evidence to guide their clinical practice while avoiding being misled. The purpose of epidemiology is to establish associations which may be causative or may reveal clues to causation. Does the presence of a particular factor lead to greater risk of disease? Does a treatment lead to greater chance of a good outcome? In answering these questions using animal or cell culture models, one can have almost complete control over experimental conditions; in humans however, one cannot have the same degree of experimental control for ethical reasons. Epidemiology can be seen, then, as the science of inferring association or causation in humans under so-called “messy” real-world conditions, when one can only observe (observational study designs) or intervene to a limited degree (interventional study designs), rather than manipulate experimentally. There are different study designs in research conducted in humans. To study causes or exposures known to be harmful, it is not ethical nor feasible to use an experimental design; for example, one cannot ask one group to start smoking and another to abstain from smoking to study if smoking causes age-related macular degeneration (AMD). Observational studies do not interfere in human subjects' choice of exposure and assess outcomes in subjects who were exposed or not exposed to the factors of interest; these are surveys, case-control, cohort studies (all with controls) or cases series (without controls). In surveys, exposures and disease outcomes are assessed at the same time, that is, cross-sectionally. Surveys simultaneously collect data on multiple exposures and outcomes for exploration of associations. Associations assessed should be guided by sound hypotheses and should be seen as hypothesis generating. A major drawback of surveys is that temporality (the exposure must precede the effect or outcome), a key component of causation, cannot be established. Surveys, if conducted in representative population-based samples, such as the baseline surveys of the Beaver Dam Eye Study, Rotterdam Study, or the Blue Mountains Eye Study, can provide estimates of frequency of the diseases at a particular point in time, regardless of when the diseases developed; this is termed prevalence. It is calculated as the proportion of subjects with the disease at a particular point in time out of the total number of subjects who were surveyed at that time. Prevalence differs from incidence, which can be provided only by longitudinal studies and refers to the proportion of subjects in whom the disease develops over a defined period, from the total number of subjects who were free of the disease at the beginning of the period. When studying rare diseases or diseases with long latency, it makes sense to start with groups who do (cases) and do not (controls) have the outcome of interest and to investigate the exposures retrospectively. The advantage of this design is also its biggest drawback: in assessing exposures retrospectively, cases may overreport exposures relative to controls (recall bias). Where and how to select the appropriate control group for a series of cases also may affect the study findings (potential selection bias). The drawbacks of case-control studies can be addressed by using cohort studies. Cohort studies are appropriate for study questions about disease causes or prognosis. Disease incidence or prognosis can be assessed during follow-up among subjects with and without the exposures of interest. Cohort designs are not feasible where the disease incidence is rare or the latency to disease is long. Failure to follow-up a large number of study subjects likely introduces selection bias; for example, subjects with better or worse outcomes may be more likely to be followed up than others (differential loss in follow-up). Table 1 provides a comparison of case-control and cohort study designs. Case-control studies can be nested within population-based surveys or cohort studies. This hybrid study design incorporates the advantage of population-based sampling (minimized selection bias) and the cost-effectiveness of investigating associations for specific diseases with exposures, using all cases and randomly selected or matched controls from the study sample.TABLE 1Comparison of Cohort and Case-Control StudiesCohort StudiesCase-Control StudiesCausal inferenceMore robustLess robustEstimation of incidence ratesYesNoEstimation of relative risksYesNo, but odds ratiosCostHighLowTimeLongShortLoss to follow-upPotential problemNot an issueStudy rare diseasesInefficientEfficientStudy multiple outcomesPossibleNot possibleStudy multiple risk factorsPossiblePossible Open table in a new tab Measures of associations provided by case-control studies are odds ratios, which can be interpreted as relative risk if the disease is rare (< 10% in prevalence) or if the effect is not too extreme, for example, odds ratio less than 2 to 3. True relative risk can be provided only by cohort studies and is an estimate of the difference in the incidence (or risk) associated with an exposure compared with the absence of the exposure. The odds ratio is the ratio of the likelihood of being exposed among the cases compared with the likelihood of being exposed among the controls. The odds are not a proportion, but rather are the probability that an event occurs (p) relative to the probability that the event does not occur (1 − p), calculated as p/(1 − p). A classic example of distortion (bias) in an observed association occurred in studies of hormone replacement therapy (HRT): beneficial effects of HRT on cardiovascular and health outcomes among middle-aged women were supported by more than 50 observational studies by different investigators. In 2002, however, a single large, randomized, controlled trial, the Women's Health Initiative Study, showed that long-term use of HRT was associated with an increased risk of invasive breast cancer, heart disease, stroke, and pulmonary embolism.3Rossouw J.E. Anderson G.L. Prentice R.L. et al.Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the Women's Health Initiative randomized controlled trial.JAMA. 2002; 288: 321-333Crossref PubMed Scopus (13746) Google Scholar How is it that a single randomized, controlled trial was enough to overturn decades of established wisdom based on observational studies? Random allocation of subjects into predefined study groups is the lynchpin of validity for randomized controlled trials. Random allocation means that participants are allocated to intervention or control groups purely by chance, and therefore any known and unknown confounders would be distributed evenly and balanced between study groups, if the sample size is sufficiently large. The intervention therefore is the only difference between the groups, and hence any difference in outcomes must be the result of this difference in intervention. The randomized, controlled trial design is as close as one can come to an experimental design in human studies and as pure a test of association as possible. However, practical limitations in real life mean that this even balance of confounders can be upset by a number of key issues. These will be addressed in the fourth article of this series, but the key ones are: 1) quality of the randomization: when assessing evidence arising from randomized, controlled trials, readers should examine relevant characteristics among study groups to ensure comparability across groups; and 2) analysis of study findings should follow an intention-to-treat approach: the grouping of study subjects during analysis should stick to the original grouping determined by the randomization procedure, not by actual treatments received, as participants might have crossed over from control to treated group. The intention-to-treat analysis is the only way to preserve the even balance of confounders achieved by randomization, and thus the strength of the inference about causation. Crossing between groups after randomization breaks this even balance and undermines the gains of randomization. Key concepts to understanding the validity of inferring causation or association under real-world conditions are bias and confounding. Bias is considered to be any force that tends to skew results away from the true association. Two main sources of bias are selection bias, a bias in the way participants are recruited into a study, and information bias, a bias in the way information is gathered. For example, a study of the cause of cataract in a sample with connective tissue disease yields very different results from a study looking at the same question in a healthy community-dwelling sample (selection bias). Ascertaining exposures retrospectively in those who have AMD may yield different results than in those who have normal vision (recall bias, a common type of information bias), given that there may be a tendency to recall even minimal exposures in those with disease. Confounding occurs when there is a third factor (confounder) that influences the association under investigation. For example, in searching for a causative link between smoking and AMD, it may be that smoking goes hand in hand with a low antioxidant diet, which may be causative of AMD; that is, the low antioxidant level confounds the smoking and AMD association. Potential confounding factors may mask the association, may lead to a false association, or may change the direction of the association under investigation. Examining potential confounding effects needs content knowledge and statistical adjustment. In observational studies, one must try to eliminate bias and to measure and adjust for all potential confounders in the analyses to obtain as valid a measure of association as possible. However, one is never guaranteed that all potential confounders have been measured, or measured accurately. Even adjustment for the confounders using statistical methods can leave residual confounding. Hence, one can never be absolutely sure of an association documented only in observational studies. These key issues explain the power of the Women's Health Initiative randomized, controlled trial to overturn decades of observations for HRT. The discrepancy between previous observations and the randomized, controlled trial was the result of selection bias, information bias, and confounding factors: women who were receiving HRT were healthier and saw doctors on a more frequent basis than women who did not receive HRT, and attempts to measure these confounders either were incomplete or were measured with error, leading to either no adjustment or residual confounding from these variables. When HRT users had better outcomes, it was erroneously credited to HRT.4Enserink M. Women's health The vanishing promises of hormone replacement.Science. 2002; 297: 325-326Crossref PubMed Scopus (31) Google Scholar It is worth noting that the discussion thus far has focused on issues of validity, that is, the so-called truthfulness of the findings. Statistical tests and P values speak only to issues of precision, that is, how wide the confidence interval is around an estimate. It must be emphasized that no statistical testing can detect whether there is selection bias, information bias, or confounding. Large sample sizes only aid precision, and an estimate of the effect of an intervention may be precise and significant but still may be invalid because of the way that an exposure was measured, loss to follow-up, or cross-over in study groups. It cannot be emphasized sufficiently that no statistical test can be used to judge validity; only critical appraisal can detect biases and threats to validity. The foregoing discussion makes clear why there has emerged a hierarchy of evidence in terms of validity about causation. Randomized, controlled trials are held as the highest grade of evidence for an association, although meta-analysis, where multiple randomized, controlled trials are pooled, holds an even higher rank. Next come the observational designs, with cohort, case-control, and cross-sectional studies in descending order (Table 2).5National Health and Medical Research CouncilA guide to the development, implementation and evaluation of clinical practice guidelines. National Health and Medical Research Council, Canberra, Australia1999Google ScholarTABLE 2Level of EvidenceaIncreasing levels of evidence indicate poorer quality of evidence. These levels of evidence ratings have been adapted from Appendix A, page 388 of: Fisher M, editor. United States Preventive Services Task Force. Guide to Clinical Preventive Services: An Assessment of the Effectiveness of 169 Interventions. Baltimore, Maryland: Williams & Williams, 1989. by Study TypeLevelRating CriteriaIEvidence obtained from a systematic review or meta-analysis of all relevant randomized, controlled trialsIIEvidence obtained from at least one properly designed randomized, controlled trialIII-1Evidence obtained from well-designed pseudorandomized controlled trials (without proper randomization)III-2Evidence obtained from comparative studies with concurrent controls and allocation not randomized (cohort studies), case-control studies, or interrupted time series with a control groupIII-3Evidence obtained from comparative studies with historical control, 2 or more single-arm studies, or interrupted time series without a parallel control groupIVEvidence obtained from descriptive case series, either before testing or before and after testinga Increasing levels of evidence indicate poorer quality of evidence. These levels of evidence ratings have been adapted from Appendix A, page 388 of: Fisher M, editor. United States Preventive Services Task Force. Guide to Clinical Preventive Services: An Assessment of the Effectiveness of 169 Interventions. Baltimore, Maryland: Williams & Williams, 1989. Open table in a new tab However, this does not mean that this hierarchy should be adopted blindly; there is now increasing recognition that not all randomized, controlled trials are equal. A badly performed randomized, controlled trial may rank lower than a well-conducted cohort or case-control study. Furthermore, there is also increasing recognition that even a well-conducted randomized, controlled trial does not mean that an intervention is adopted automatically; translating a result into clinical practice depends on a consideration of local circumstances, patient values, and resource availability.6McAlister F.A. Straus S.E. Guyatt G.H. Haynes R.B. Users' guides to the medical literature: XX. Integrating research evidence with the care of the individual patient. Evidence-Based Medicine Working Group.JAMA. 2000; 283: 2829-2836Crossref PubMed Scopus (207) Google Scholar, 7Guyatt G.H. Sinclair J. Cook D.J. Glasziou P. Users' guides to the medical literature: XVI. How to use a treatment recommendation. Evidence-Based Medicine Working Group and the Cochrane Applicability Methods Working Group.JAMA. 1999; 281: 1836-1843Crossref PubMed Scopus (141) Google Scholar The Grading of Recommendations Assessment, Development and Evaluation working group devised a scheme with extended levels of evidence to enumerate the various considerations that should be balanced in recommending an intervention.8Guyatt G.H. Oxman A.D. Vist G.E. et al.GRADE: an emerging consensus on rating quality of evidence and strength of recommendations.BMJ. 2008; 336: 924-926Crossref PubMed Google Scholar Finally, even when an association is robust, caution also should be taken, because associations are not always causal. A judgment about causation also needs to incorporate such considerations as biological plausibility, evidence from other avenues (ie, animal models, cell-culture work), and consistency of data. In the end, causation is a subjective call that requires expert judgment, content knowledge, and critical appraisal skills; this series aims to help readers to judge causation or association free of bias as much as possible. The authors indicate no financial support or financial conflict of interest. Both authors (J.J.W., J.A.) were involved in the design and conduct of study; collection of data; management, analysis, and interpretation of data; and preparation, review, and approval of manuscript. Institutional review board approval was not applicable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call