Abstract

Central MessageRandomized trials have unique challenges in surgery. The balance of this information with carefully analyzed observational data can provide a robust foundation of knowledge to inform surgical care.This Invited Expert Opinion provides a perspective on the following paper: J Am Coll Cardiol. 2020 Aug 4;76(5):580-589. https://doi.org/10.1016/j.jacc.2020.05.069.See Commentaries on pages 763 and 764.“All models are wrong, but some are useful.”1Box G.E.P. Science and statistics.J Am Stat Assoc. 1976; 71: 791-799Crossref Scopus (1025) Google Scholar—George E. B. Box Randomized trials have unique challenges in surgery. The balance of this information with carefully analyzed observational data can provide a robust foundation of knowledge to inform surgical care. This Invited Expert Opinion provides a perspective on the following paper: J Am Coll Cardiol. 2020 Aug 4;76(5):580-589. https://doi.org/10.1016/j.jacc.2020.05.069. See Commentaries on pages 763 and 764. Obesity has been artfully described as starvation in a sea of calories. Applying similar logic, one might conclude that the age of information has left us desperately seeking truth in a flood of info-bytes. The number of retractions of scientific articles has risen in recent decades, from fewer than 100 annually before 2000 to nearly 1000 in 2014.2Brainard J. You J. What a massive database of retracted papers reveals about science publishing's ‘death penalty.’ Science.https://www.sciencemag.org/news/2018/10/what-massive-database-retracted-papers-reveals-about-science-publishing-s-death-penaltyDate: 2018Date accessed: October 7, 2020Google Scholar Into this maelstrom enters an impressive group of experienced researchers suggesting that, for medicine, the ultimate source of truth is the prospective randomized control trial (RCT).3Fanaroff A.C. Califf R.M. Harrington R.A. Granger C.B. McMurray J.J.V. Patel M.R. et al.Randomized trials versus common sense and clinical observation: JACC Review Topic of the Week.J Am Coll Cardiol. 2020; 76: 580-589Crossref PubMed Scopus (32) Google Scholar The argument appears compelling, as they present various trials that have reversed previous clinical impression born of common sense and observational data. However, scrutiny of the list reveals the limitations as well as the value of prospective trials. The ISCHEMIA trial4Maron D.J. Hochman J.S. Reynolds H.R. Bangalore S. O'Brien S.M. Boden W.E. et al.Initial invasive or conservative strategy for stable coronary disease.N Engl J Med. 2020; 382: 1395-1407Crossref PubMed Scopus (808) Google Scholar is represented as disproving the belief that relieving ischemia by revascularization reduces the risk of mortality or myocardial infarction (MI). However, the study was limited to stable patients without left main disease, was underpowered for the surgical arm and for most subgroup analyses (crucially important in clinical practice), and only followed patients for 2 years, with the end point of MI or death actually crossing at 1 year and continuing to diverge in favor of revascularization at the end of 2 years. Disproof, or limitation? The STICH (Surgical Treatment for Ischemic Heart Failure) trial,5Jones R.H. Velazquez E.J. Michler R.E. Sopko G. Oh J.K. O'Connor C.M. et al.Coronary bypass surgery with or without surgical ventricular restoration.N Engl J Med. 2009; 360: 1705-1717Crossref PubMed Scopus (548) Google Scholar which is purported to disprove the value of ventricular reconstruction, was actually designed to study the impact of a 30% reduction in ventricular volume in patients with previous MI. Incredibly, 13% of patients did not even have a history of MI, and the average volume reduction in the reconstruction arm was only 19%—unfortunately, it was the operation rather than the trial that was discredited—and the “truth” remains obscure. In contrast, the Cardiothoracic Surgical Network study of mitral repair for moderate ischemic mitral regurgitation6Michler R.E. Smith P.K. Parides M.K. Ailawadi G. Thourani V. Moskowitz A.J. et al.Two-year outcomes of surgical treatment of moderate ischemic mitral regurgitation.N Engl J Med. 2016; 374: 1932-1941Crossref PubMed Scopus (330) Google Scholar demonstrated a reduction in mitral regurgitation without associated reduction in mortality or heart failure—however, this was not, as suggested, in contrast to previous knowledge but actually consistent observational studies.7Virk S.A. Tian D.H. Sriravindrarajah A. Dunn D. Wolfenden H.D. Suri R.M. et al.Mitral valve surgery and coronary artery bypass grafting for moderate-to-severe ischemic mitral regurgitation: meta-analysis of clinical and echocardiographic outcomes.J Thorac Cardiovasc Surg. 2017; 154: 127-136Abstract Full Text Full Text PDF PubMed Scopus (26) Google Scholar The design and implementation of RCTs in surgery face specific challenges. Lack of equipoise, rapid evolution in techniques, differences in expertise and referral patterns, learning curve effects, challenges in blinding operators and assessors, perceived threat to the professional reputation, as well as limited education in clinical research of the surgical community are some of the complexities unique to surgical RCTs.8Solomon M.J. McLeod R.S. Should we be performing more randomized controlled trials evaluating surgical operations?.Surgery. 1995; 118: 459-467Abstract Full Text PDF PubMed Scopus (237) Google Scholar Issues of expertise and skill acquisition can be avoided by trial design that selects only those operators with demonstrable expertise in the operative approaches under study. Here again, internal validity may be gained at the price of general applicability. The more subtle issue of questionable equipoise can be addressed with innovative approaches such as that used in the CORONARY trial, in which sites had experts in both on- and off-pump surgery and only after treatment randomization were patients assigned to the surgical team most skilled in the assigned approach.9Lamy A. Devereaux P.J. Prabhakaran D. Taggart D.P. Hu S. Paolasso E. et al.for the CORONARY investigatorsOff-Pump or on-pump coronary-artery bypass grafting at 30 days.N Engl J Med. 2012; 366: 1489-1497Crossref PubMed Scopus (514) Google Scholar However, due to the considerable challenges posed by randomized trials, the evidence-base for surgery has traditionally comprised mostly observational studies, with RCTs representing a small minority of the published literature.10Howes N. Chagla L. Thorpe M. McCulloch P. Surgical practice is evidence based.Br J Surg. 1997; 84: 1220-1223Crossref PubMed Scopus (183) Google Scholar In a review of the evidence supporting the 10 most commonly performed surgical procedures in the United States published between 1970 and 2018, RCTs represented less than 10% of the published studies without increase in the last 15 years.11Henry M. Rong L.Q. Wingo M. Rahouma M. Girardi L.N. Gaudino M. The evidence on the ten most common surgical interventions in the United States from 1970 to 2018.Ann Surg. 2019; 70: e16-e17Abstract Full Text Full Text PDF Scopus (17) Google Scholar In cardiac surgery, RCTs represented 0.3% to 1.5% of the studies published in the last 20 years and their proportion decreasing over time.12Gaudino M. Kappetein A.P. Di Franco A. Bagiella E. Bhatt D.L. Boening A. et al.Randomized trials in cardiac surgery: JACC review topic of the week.J Am Coll Cardiol. 2020; 75: 1593-1604Crossref PubMed Scopus (22) Google Scholar Herein we address the strengths and limitations of randomized trials and observational studies in cardiac surgery (with particular emphasis on the use of electronic health records [EHRs]) and highlight the importance of both for evidence accrual and clinical decision making. While observational studies are important for the description of trends and event rates and allow identification of risk factors and the development of predictive modeling, the comparative assessment of 2 or more surgical interventions based on observational data is a complicated issue. Surgeons are trained to adapt intervention to the anatomic and physiologic characteristics of the individual patient, as well as to their own technical expertise. This process of individualizing treatment to the patient and the surgeon is one of the foundations of good surgery, but is also responsible for the strong treatment-allocation and expertise biases in all observational comparative studies of surgical procedures. Statistical adjustment of patient covariates in the compared groups is generally used to minimize the biases and regression analysis, with propensity score (PS) techniques, is among the most common approaches. The key problem with all these techniques, however, is that they can match only for known and measured confounders. The “eye-balling” process described previously is a complex integration of numerous variables related to the patient's demographics, functional and psycho-social status, the characteristics of the underlying disease, the available logistics, and the surgeon's level of comfort with the possible technical options. Unfortunately, only a small minority of the variables included in this process can be objectively measured and are captured in our databases, so that even the best statistical methodologies do not guarantee the absence of hidden confounders and residual bias, problems which the randomized trial approach intrinsically avoids by theoretically neutralizing the potential impact of all variables, both recognized and unknown, other than the treatment assignment in the evaluation of outcome. Hidden confounders, and not true biologic effect, are likely responsible for the difference seen in observational studies comparing the outcomes of patients receiving single versus bilateral internal thoracic arteries for bypass surgery.13Gaudino M. Di Franco A. Rahouma M. Tam D.Y. Iannaccone M. Deb S. et al.Unmeasured confounders in observational studies comparing bilateral versus single internal thoracic artery for coronary artery bypass grafting: a meta-analysis.J Am Heart Assoc. 2018; 7: e008010Crossref PubMed Scopus (86) Google Scholar The effect was found despite the use of PS matching, testifying to how even complex techniques of statistical adjustment are inadequate to neutralize the strong treatment allocation bias. Sensitivity analyses can demonstrate the stability of the results, but cannot replace the internal validity of random assignment. Of note, after almost 4 decades of observational studies reporting better clinical outcomes in patients receiving bilateral internal thoracic arteries, the first adequately powered RCT did not show any difference between the 2 groups.14Taggart D.P. Benedetto U. Gerry S. Altman D.G. Gray A.M. Lees B. et al.Bilateral versus single internal-thoracic-artery grafts at 10 years.N Engl J Med. 2019; 380: 437-446Crossref PubMed Scopus (230) Google Scholar,15Gaudino M. Rahouma M. Hameed I. Khan F.M. Taggart D.P. Flather M. et al.Disagreement between randomized and observational evidence on the use of bilateral internal thoracic artery grafting: a meta-analytic approach.J Am Heart Assoc. 2019; 8: e014638Crossref PubMed Scopus (7) Google Scholar Even though that study was subject to many limitations and not “definitive,” the comparison of the outcome curves of the randomized trial and the previous propensity matched observational studies is an important reminder of the level of bias and confounders persisting even in the observational studies that use complex statistical methods for adjustment (Figure 1).13Gaudino M. Di Franco A. Rahouma M. Tam D.Y. Iannaccone M. Deb S. et al.Unmeasured confounders in observational studies comparing bilateral versus single internal thoracic artery for coronary artery bypass grafting: a meta-analysis.J Am Heart Assoc. 2018; 7: e008010Crossref PubMed Scopus (86) Google Scholar Observational studies provide important information on temporal trends in disease prevalence or in the adoption of surgical techniques. They can successfully evaluate the association of patient characteristics with outcomes and are key to develop and validate risk models. Observational studies tend to have greater external validity than RCTs, as they enroll a more heterogeneous population and measure the results of surgical interventions in the “real-world” outside of the highly controlled setting of RCTs (examining effectiveness rather than efficacy). While RCTs provide almost unbiased estimates of the average patient treatment effect (ATE—the average effect in the average patient), the heterogeneity and large sample size of observational studies allow for estimation of the individual patient treatment effect (the individual effect in the single patient), although with a greater risk of bias. As shown in Figure 2,16Rice T.W. Lu M. Ishwaran H. Blackstone E.H. Worldwide Esophageal Cancer Collaboration InvestigatorsPrecision surgical therapy for adenocarcinoma of the esophagus and esophagogastric junction.J Thorac Oncol. 2019; 14: 2164-2175Abstract Full Text Full Text PDF PubMed Scopus (5) Google Scholar the individual patient treatment effect may be largely different than the ATE in the different patients and is generally more reflective of the clinical world outside the controlled setting of RCTs. In contrast, the RCT-based ATE is the most accurate estimate of the comparative differences between treatments and cannot be ignored in clinical decision making. Another important limitation of RCTs relates to the fact that, to increase the event rate and thereby to reduce the needed sample size, trialists often rely on the use of composite outcomes. The selection of the individual components of the composite is a key decision and is of paramount importance that they all express the same biological process. Unfortunately, it is not uncommon to observe the uncomfortable situation in which different events included in the composite outcome move in opposite directions, leaving great uncertainty in the most appropriate clinical understanding of the trial result.17Freemantle N. Calvert M. Wood J. Eastaugh J. Griffin C. Composite outcomes in randomized trials: greater precision but with greater uncertainty?.JAMA. 2003; 289: 2554-2559Crossref PubMed Scopus (511) Google Scholar In addition, clinically less relevant events are generally more frequent than clinically important events, and often drive the composite outcome. Thus, the controversial situation of the EXCEL (Evaluation of XIENCE vs Coronary Artery Bypass Surgery for Effectiveness of Left Main Revascularization) trial, where a mostly enzymatic and relatively greater frequency outcome—perioperative MI—was directionally opposite to all the other outcomes and was the key driver of the result of the primary analysis.18Stone G.W. Kappetein A.P. Sabik J.F. Pocock S.J. Morice M.C. Puskas J. et al.Five-year outcomes after PCI or CABG for left main coronary disease.N Engl J Med. 2019; 381: 1820-1830Crossref PubMed Scopus (325) Google Scholar It must also be noted that small RCTs generally overestimate the treatment effect (likely due to unconscious bias of the investigators)19Dechartres A. Trinquart L. Boutron I. Ravaud P. Influence of trial sample size on treatment effect estimates: meta-epidemiological study.BMJ. 2013; 346: f2304Crossref PubMed Scopus (272) Google Scholar and that suboptimal trial conduct (high crossover or protocol violation rate) can bias the result toward the null (in superiority trials) or toward rejection of the null hypothesis (in non-inferiority trials). The adoption of new techniques in surgery does not always wait for the publication of solid supportive evidence. In case of techniques that are perceived as safe and effective based on clinical judgment and biologic rationale, large-scale adoption may proceed in the absence of RCT-based validation: “It's always too early until, unfortunately, it's suddenly too late.”20Buxton M. Problems in the economic appraisal of new health technology: the evaluation of heart transplants in the UK.in: Drummond M. Economic Appraisal of Health Technology in the European Community, EC Series on Health Services Research. Oxford University Press, Oxford1987: 103-118Google Scholar In this case, the lack of equipoise makes RCTs extremely challenging, and observational studies become a more reliable source of evidence. The use of the left internal thoracic artery rather than the saphenous vein to graft the left anterior descending artery or of laparoscopic rather than open cholecystectomy are examples of procedures that became standard of care even before their superiority could be proven in the setting of appropriately powered RCTs.21Allori A.C. Leitman I.M. Heitman E. Delayed assessment and eager adoption of laparoscopic cholecystectomy: implications for developing surgical technologies.World J Gastroenterol. 2010; 16: 4115-4122Crossref PubMed Scopus (8) Google Scholar EHRs represent an attractive source of large amounts of data, often accumulating at a fast pace without the need for a specifically designed research infrastructure and may be a useful source of information during times when no evidence is available to inform physicians' choices, as during the initial months of the coronavirus disease 2019 (COVID-19) pandemic. Moreover, the ready availability of electronic and administrative data, despite their many limitations, save the considerable expense entailed with the proper conduct of RCTs. Moreover, even if expense were not an issue, it must certainly be acknowledged that not every issue of surgical nuance is appropriately or pragmatically going to be the subject of a RCT. However, EHR data have important limitations when used for research purposes. The first obvious but critical issue is that EHR data, unlike data from clinical trials, are collected for clinical and billing purposes rather than to answer specific research questions. Clinical parameters may be collected at times that are not uniform, and different instruments may be used to collect the same variable in different patients. Some variables may be collected sporadically or not at all if not part of routine medical care. Data may be very sparse and plagued by missingness whose mechanism is difficult to ascertain. EHR data are in continuous evolution as they are entered, coded, and corrected over time. Some inconsistencies are captured and rectified, whereas others may be permanently part of the data. The “copy-and-paste” phenomenon can replicate inaccurate information that may or may not be corrected over time, potentially yielding both inaccurate or frankly different information depending upon when the data is accessed. Therefore, analyses conducted at different points in time could yield different results. Decisions related to patient treatment are not random (as it would be in a clinical trial), but rather very much related to patients' disease severity, comorbid conditions, and physician judgment, so the strong treatment allocation bias described for observational studies must be assumed for EHR-based research. The methodological problems usually encountered in the analysis of observational studies are amplified by biases and other data problems that affect EHRs. Treatment allocation bias makes groups different in ways that are difficult to fully understand and properly correct. Multivariable analyses, highly effective when few known and measurable confounders need to be controlled for, are limited in their ability to distinguish treatment from patient selection if the groups are fundamentally different. Analyses based on PS have been extensively used to obtain comparable groups.22Kurlansky P. Lies, damned lies and statistics.J Thorac Cardiovasc Surg. 2015; 150: 20-21Abstract Full Text Full Text PDF PubMed Scopus (5) Google Scholar Whether based on PS-matched subgroups or weighted by inverse PS (inverse probability of treatment weighting), these approaches are valid only if the PS model is specified correctly. To avoid residual or newly created biases, thorough diagnostics should be performed on the PS model, including checks for correct model specification, evaluation of standardized differences and extreme scores, and use of robust variance estimates. Exploring the potential impact of unmeasured variables will help to bring the findings into clearer focus.23VanderWeele T.J. Ding P. Sensitivity analysis in observational research: introducing the E-value.Ann Intern Med. 2017; 167: 268-274Crossref PubMed Scopus (1588) Google Scholar Immortal time bias is also often encountered both in observational studies as well as in HER-based analyses. This problem occurs when treatments are not initiated at the same time for all patients in the cohort. Since the patients with the later treatment necessarily needed to survive until the time that treatment was initiated, comparing treatment groups without accounting for the time of treatment initiation could spuriously advantage those treatments that were initiated later. Since available information is based on the patients' clinical course rather than prescribed timelines, truncation and censoring also tend to occur more often in EHR based studies than in clinical trials. It is important to correctly identify censoring and truncation times and evaluate whether they are related to the outcomes of interest. In addition, competing risks are inevitable as patients may experience multiple events during the course of the observation period which may prevent or preclude the occurrence of the outcome of interest. Models that account for all these factors can be very complex and require attentive evaluation of all assumptions. In the absence of the prospective vigilance of RCTs, data of potential interest are commonly missing in EHRs. Moreover, while in clinical trials the mechanism by which data are missing is often known and potentially accounted for with reliable multiple imputation techniques, EHRs based studies usually lack the crucial information needed to inform the proper imputation strategy. Missingness is usually informative of the patient's condition and outcomes (eg, less sick patients are more likely to have missing diagnostic tests at admission). Analyses restricted to patients with available data reduce generalizability of the findings or leads to biased estimates, depending on the underlying missing data mechanism. The potential for misclassification presents a further challenge. Unlike clinical trials where patient's baseline characteristics are captured and source verified, different EHR data elements (eg, diagnosis, laboratory values, medications) may need to be combined to determine the presence or absence of a condition. Depending on the algorithm used and the accessibility and completeness of data, definitions may vary in their sensitivity and specificity with downstream implications for modeling and interpretation.24Goldstein B.A. Bhavasr N.A. Phelan M. Pencina M.J. Controlling for informed presence bias due to the number of health encounters in an electronic health record.Am J Epidemiol. 2016; 184: 847-855Crossref PubMed Scopus (59) Google Scholar Lastly, heterogeneity is an intrinsic characteristic of EHR data. Data may come from different hospitals within a health system or even from different health systems. While a certain level of variability is welcome as it allows for more complete analyses and high generalizability, disorganized heterogeneity may confound the results and make it difficult to obtain precise estimates. There is an unfounded belief that the large sample sizes offered by EHR or observational studies, particularly those involving large registry or administrative datasets, will overcome these problems. Unfortunately, no sample size is large enough to correct for biases induced by missing, misclassified or inconsistent data, or incorrect analytical methods. Whenever possible, appropriate sensitivity analyses should be performed and can assist in validating the results. Consideration of the sources of data would not be complete without mention of the analytical framework within which such data are interpreted. There is no doubt that artificial intelligence and machine-learning techniques will have a major impact on our ability to use, integrate, and understand the multiple data sources available. Although a huge topic beyond the scope of this review, a few comments seem in order. The prodigious processing power of such approaches that can manage terabytes/petabytes/exabytes of data enables access to the massive data sources of digital imaging, of the various “omics” (genomics, epigenomics, metabolomics, proteomics, microbiome, etc), of administrative and EHR databases, of registries and clinical trial results, and of emerging information from physiological measurements from personal devices. Integration with natural language processing may facilitate inference from medical records, much of which are not in digital format, as well as from rapidly emerging medical literature. The relatively fewer assumptions underlying such methodologies tend to make such approaches less vulnerable to missing data and more adept at identifying patterns and associations that may not have been previously appreciated. As such, the concern regarding specific data source may become less compelling and the ability to focus on the individual rather than average treatment effect may become more prevalent. Innovative approaches, such as Mendelian randomization, integrate the concept of the randomized trial with genomic information potentially available in an observational context, thus integrating rather than contrasting approaches. However, as learned painfully by IBM Watson's largely failed initial foray into the world of oncology, promise does not always yield success. Such approaches remain vulnerable to the integrity of the data, the potential bias in technique selection and organization, and the highly heterogeneous “human” element to medical care. In short, no source of “truth”—even the most expertly crafted prospective randomized trial, the most elegantly analyzed observational study, the most innovatively constructed machine learning matrix—can be separated from the need for careful assessment of data integrity, design bias and limitation, and humble willingness to accept evidence that contrasts with previous belief. It is from the balance of approaches that the most valuable information is more likely to emerge. The authors reported no conflicts of interest. The Journal policy requires editors and reviewers to disclose conflicts of interest and to decline handling or reviewing manuscripts for which they may have a conflict of interest. The editors and reviewers of this article have no conflicts of interest. Commentary: Looking for certainties, finding uncertaintiesThe Journal of Thoracic and Cardiovascular SurgeryVol. 163Issue 2PreviewStatistical analysis for clinicians is essentially a tool intended to help address clinical questions, make decisions, and guarantee the best treatment for every single patient. This scope is challenging, because the truth is not always easy to find, if there is one, and we face several specific difficulties, such as evolution of techniques, differences in expertise, and lack of equipoise, not only in surgery as a field of research, but also in clinical research itself. Full-Text PDF Commentary: In pursuit of the truth for applying randomized controlled trial results to individual patientsThe Journal of Thoracic and Cardiovascular SurgeryVol. 163Issue 2PreviewIn the recent Journal publication “Randomized Trials, Observational Studies, and the Illusive Search for the Source of Truth,” Gaudino and colleagues1 open an important discussion regarding strengths and limitations of randomized controlled trials (RCTs) and observational studies using large electronic health record databases vis-à-vis the illusive search for truth. Adding to the appeal of a large quantity of data in a broad or specific patient population of interest, machine learning2 can be used to digest such large datasets to identify patterns, even in the absence of an a priori hypothesis. Full-Text PDF

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call