Abstract

“Future randomized controlled studies are needed” is a conclusion all too familiar to readers of scientific articles. This statement, distributed liberally in the literature, has come to signify little more than an obligatory admission of a study’s deficiencies. It simultaneously seeks to absolve the authors of any fault from potentially exaggerating the importance of their statistical findings while leaving it to the reader to decide how to apply and disseminate the novel information. The statement also implies that higher quality studies could and should be devised and points to salvation by way of the randomized controlled trial (RCT), the pinnacle of evidence-based medicine.1 Despite this, historical and present-day data show that RCTs are undertaken less frequently in surgery than in other areas of medicine.2–4 This raises concerns about the quality of surgical research, but what can surgeons do about it? Surveys have demonstrated a preponderance of cohort studies and case series in surgical journals, with RCTs accounting for fewer than 10% of publications.5,6 Reasons for lack of quality surgical research have been attributed to unique methodologic, practical, and ethical considerations in evaluating surgical procedures, and surgeon reluctance and lack of experience with surgical trials.3,7,8 Although high-quality RCTs are something to strive for, it is also clear that not all problems in surgery can be evaluated in an RCT. Solomon and McLeod studied 250 research questions in gastrointestinal surgery and concluded that, under ideal conditions, only 38.8% could have been answered with an RCT.9 Rather than attempting to answer every question with an RCT, surgeons should focus on using the best research design for their clinical questions. A variety of study designs can shed important insights if they are applied to the correct problem and conducted with methodologic rigor. The idea, development, exploration, assessment, and long-term study (IDEAL) model is a five-stage framework developed to guide best research practices throughout progressive stages of surgical innovation (Table 1).10,11 Early stages focus on refining the technique and its safety profile, whereas later stages are generally amenable to higher levels of scientific inquiry. With consideration for the stage of development of the surgical technique being investigated, a simple stepwise approach can be applied using the highest feasible level of evidence when choosing a study design. In this article, we outline important concepts and discuss various study designs—RCTs, observational studies with control groups, and case series—with a focus on minimizing biases. We consider various practical and methodologic issues specific to surgical research throughout. Table 1. - IDEAL Stages of Surgical Innovationa Stage Objectives Study Methods 1. Idea Initial technical description of new surgical procedure or technology Case reports, case series 2a. Development Description of early experience with new procedure or technology, including technical modifications and short-term/safety-related outcomes Prospective case series, observational studies 2b. Exploration Larger exploratory study to better define the relevant comparator groups and outcomes, and evaluate feasibility of a future RCT Prospective observational studies, RCTsb 3. Assessment Definitive comparative evaluation against the standard of care RCTs, large prospective observational studiesc 4. Long-term study Collect data for surveillance in real-world practice, identify new complications, and revise indications as needed Registry-based prospective cohort studies or case-control studies, rare case reports aDeveloped from initial description of IDEAL framework by McCulloch P, Altman DG, Campbell WB, et al. No surgical innovation without evaluation: the IDEAL recommendations. Lancet 2009;374:1105–1112.bPerformed when feasible at this stage.cAppropriate when ethical or practical factors preclude performing an RCT. DESIGNING A SURGICAL RESEARCH STUDY A 1994 commentary entitled “The Scandal of Poor Medical Research” was a heated outcry for higher quality research to be performed for a purpose greater than expanding the researcher’s curriculum vitae.12 It pointed to a variety of factors plaguing medical research, including flawed designs, inadequate samples, inappropriate statistical analyses, and misleading interpretation of results. At the heart of the problem was the combination of the “publish or perish” research climate together with a “general failure to appreciate the basic principles underlying scientific research.” Although it is not the goal of this article to review these principles in detail, the concepts of validity, bias, and statistics warrant consideration for our subsequent discussion. Once a question and hypothesis have been established, choosing the study design is the next step. The process should consider ways to maximize validity and minimize bias. Internal validity refers to the accuracy with which the findings reflect true variation in the outcomes measured in the study sample, whereas external validity refers to the extent to which the findings are generalizable to clinical populations encountered in practice.13 The interplay between these concepts is complex. By way of selective inclusion and exclusion criteria, RCTs ensure that groups of patients compared are equivalent in almost all aspects, other than the treatment administered, in striving to achieve a high level of internal validity. This can come at the expense of external validity. For instance, surgeons in real-world practice encounter patients who would not meet the strict inclusion and exclusion criteria of such studies. Biases refer to factors outside of the surgical intervention that systematically influence the results of a study.14 They detract directly from the validity of the study and can occur during study design and preparation, data collection, analysis, and interpretation of results.13,14 Biases are categorized conceptually in terms of their overarching similarities.15 Important conceptual categories are selection biases, observation biases, and confounders (Table 2). Selection biases occur during design of a study and result in unintended differences between comparison groups. They are attributable to systematic flaws in the methodology used to select patients. For instance, in a nonrandomized study investigating a novel surgical procedure compared with nonoperative treatment, patients selected to undergo surgery would likely be healthier, more compliant, and have better access to care than patients treated nonoperatively. These baseline factors stack the odds in favor of surgical outcomes. Observation biases occur during data collection and can be attributed to flaws in the measurement tools used, effects of the study environment on patient behavior, or deferential attitudes of surgeons toward one of the groups. For example, surgeons may be more invested in obtaining favorable outcomes and minimizing complications of surgically treated patients, unintentionally providing better follow-up care. A confounder is any unaccounted for “third variable” responsible for the causal relationship observed in the study. This could be a patient factor, such as smoking, in a study comparing infection rates after different wound care regimens, or a treatment-related factor, such as asymmetry in the duration of postoperative immobilization, in a study comparing different interventions for fractures. Such factors could be more pertinent than the treatments themselves to the outcomes being investigated. Table 2. - Examples of Biases in Surgical Research Bias Description Means of Minimization Selection biases Nonrespondent bias Patients declining to participate in the study differ systematically from those who take part Optimizing communication of objectives, limiting burden of participating Referral bias Patients referred to clinician and recruited to study have more severe disease Predefined and evidence-based inclusion criteria Channeling bias Patients assigned to interventions based on prognostic factors Randomized studies with allocation concealment, stratification in nonrandomized studies Chronology bias Intervention group compared with historical cohort Using concurrent comparison groups Observation biases Recall bias Inaccurate reporting of treatment response by patients Using prospective design and objective outcome measurements Hawthorne effect Change in behavior as a result of participation in a study Randomization, blinding Performance bias Results influenced by variability in surgical technique or expertise Standardized protocols, different surgeons for different procedures, stratification by surgeon level of expertise Transfer bias Patients lost to follow-up differ from those who completed study By maintaining up-to-date contact information, limiting study burden, issuing reminders Detection bias Outcomes not measured uniformly in comparison groups Concurrent comparison groups, blinding, predefined objective outcomes Conflict of interest Competing interests or secondary gain influence outcomes Mandatory disclosure, randomized blinded designs Misuse of statistics and inappropriate interpretation of statistical findings are other common culprits of poor surgical research. The majority of these issues are attributable to unintentional oversights, but a small proportion are caused by deliberate fraud.16,17 Common statistical errors include inadequate power, selection of inappropriate statistical tests, and inappropriate subgroup analysis.18–20 Complex statistical techniques are often relied on to make up for deficiencies in the study design, such as lack of randomization. Studies also tend to focus on statistical differences without considering clinically important thresholds, leading to exaggeration of the findings. Like strategies to minimize bias, the use of statistics should be planned when designing the study, with primary and secondary outcome variables and the statistical plan specified a priori. A well-designed study can also simplify the statistical methods needed. A thorough understanding of validity, bias, and statistical methods is critical when selecting any of the specific study designs discussed below. RANDOMIZED CONTROLLED TRIALS Now suppose that you have formulated a novel and structurally sound research question and have the resources at your institution to conduct the ideal study. A logical approach would be to start atop the pyramid of evidence and determine whether an RCT could be devised before considering other study types at lower levels of evidence.21 RCTs are generally most appropriate in the exploration or assessment stage of surgical innovation, once the technical aspects and safety profile of the procedure are established. Conducting an RCT requires intensive devotion of resources and time. The Wrist and Radius Injury Surgical Trial, led by the senior author (K.C.C.) to evaluate outcomes of treatments for distal radius fractures, took 10 years to complete and required funding from two National Institutes of Health grants.22 Compared with other types of studies, the same factors that make the RCT a coveted entity can also make it unwieldy in surgical research. Ethical and Practical Considerations A variety of ethical and practical considerations pose challenges for surgical RCTs. Like other comparative studies, RCTs are designed to evaluate a null hypothesis, which requires a control group. Unlike pharmaceutical trials, a methodologically equivalent control in surgery, such as a placebo, is rarely possible. One might be tempted to consider nonoperative treatment to be analogous to the placebo. This common view has been the basis of many surgical trials but is often misguided. In reality, nonoperative treatments are highly variable and difficult to control. They can be interventions such as activity modification, immobilization, physical therapy, injections, or a combination thereof. Patients may also implement their own individualized lifestyle changes, such as specialized diets and fitness regimens. Even if the nonoperative treatment is tightly and reliably controlled, the natural history of the patient’s condition may confound outcomes. In surgical trials, the true equivalent of a placebo would be sham surgery, a notion that transgresses ethical boundaries except in the most selective circumstances.23 In most cases, the use of an RCT to evaluate a surgical procedure mandates clinical equipoise between the surgical treatment being investigated and some other treatment.8 Nevertheless, even the existence of true clinical equipoise does not guarantee the feasibility of an RCT. Randomization and allocation concealment can be difficult to execute.7,8 These attributes strip the patient and physician of the opportunity for shared decision-making. Patients may be uncomfortable with randomized treatment selection without input from the surgeon, or with the notion of being part of an experiment. Patients often arrive at the surgeon’s office with a preference for a particular treatment.24 Even when all of these factors are accounted for, patients who agree to participate in an RCT may differ from patients who do not consent, introducing selection bias.3 Ethical considerations are not limited to the patient. Surgeons conducting the study must strive to treat participants equally to avoid unduly influencing the results of the treatment groups. However, surgeons often have their own preferences for one procedure over another.25 If the same surgeon is performing multiple procedures in the study, he or she may also be more facile in one of the procedures, creating potential for performance bias. This brings forth the issue of blinding, which is generally easier to accomplish in medical trials than in surgical trials. Clearly, the surgeon performing the operation cannot be blinded, but it is also difficult to ensure that patients are blinded to the treatment that they received. Patients may become aware of the surgery that was performed based on the shape of a scar, length of time spent in the operating room, or some other discernible feature. The ethics of withholding information about the surgery from the patient and examiners at follow-up visits are controversial, particularly if difficulties with recovery or complications arise. Cost and challenges recruiting patients and ensuring they return to follow-up are other factors that can make the RCT unfeasible.26 Given some of the ethical and practical factors associated with RCTs, other study designs discussed below may be better options. Minimizing Bias in RCTs Good design is crucial for maximizing validity and minimizing bias. Alarmingly, analyses of all RCTs published in The Journal of Bone and Joint Surgery between 1998 and 2013 showed that trials evaluating surgical procedures (as opposed to nonoperative or medical interventions) and with surgeons as first authors scored significantly lower in quality.27,28 To illustrate features of a well-designed surgical RCT, consider a landmark trial investigating open reduction and internal fixation versus casting for minimally displaced scaphoid waist fractures (the Scaphoid Waist Internal Fixation for Fractures Trial).29 This study sought to answer a timely question that has been a longstanding area of clinical equipoise. Allocation concealment and randomization were used with a remote randomization service that stratified patients based on their pattern of fracture displacement, further minimizing confounders. Sample size was determined based on a priori power analysis, and statistical analyses were performed by a blinded statistician following a strict prespecified plan. Primary and secondary outcomes were stated from the outset, and potentially subjective radiologic tests were evaluated independently by two musculoskeletal radiologists and a surgeon. Instead of blinding patients and surgeons, the surgical technique and postoperative protocols were left to the discretion of the surgeons, allowing patient-centered care and accounting for differences in surgeons’ technical preference and skill. This practical approach sought to minimize bias while also maximizing external validity, which was important for a trial aiming to inform clinical practice on an international scale. Empiric data support the importance of sound study design on the results of an RCT. Colditz et al. found that the likelihood of an improved outcome from an experimental treatment increased significantly in the absence of randomization.30,31 In a study analyzing 250 controlled trials, treatment effects were increased by 41% when allocation concealment was inadequate and 30% when the extent of concealment was unclear.32 Trials that were not double-blinded yielded treatment effects 17% greater than double-blinded trials. Methodologic problems like these can lead to inaccurate interpretation of findings and distort knowledge found in the published literature. They can be avoided with use of validated quality checklists such as the Consolidated Standards of Reporting Clinical Trials guidelines when planning a study.33 Such frameworks serve as an important quality safeguard and should be familiar to surgeons undertaking RCTs. Grading systems specific to plastic surgery have also been developed.34 OBSERVATIONAL STUDIES WITH A CONTROL GROUP Many questions in surgery cannot be addressed with an RCT but are amenable to other study designs. A nonrandomized study with a control group should be considered when one of the ethical or practical factors discussed above precludes an RCT. Commonly used designs are cohort studies and case-control studies (Fig. 1).15 Although cohort studies are in a higher tier of evidence, both designs have specific indications, methodologic considerations, and biases.Fig. 1.: Observational study designs.Like RCTs, cohort studies can be highly influential in their scope. Because of fewer upfront demands on participants and less stringent inclusion criteria, these studies lend themselves to broader investigations when more knowledge is needed before performing an RCT. For instance, a prospective cohort study was devised to evaluate the effectiveness and safety of immediate implant-based breast reconstruction with or without mesh following mastectomy (Implant Breast Reconstruction Evaluation study).35 This is a current topic of interest for many plastic surgeons, with recent trends toward immediate reconstruction following mastectomy.36 Although previous RCTs evaluating immediate breast reconstruction had been performed, their validity was questioned because of performance biases and low sample size.37–39 The Implant Breast Reconstruction Evaluation study addressed this gap in knowledge with an exploratory design that included more than 2000 patients from 81 participating centers. Patients underwent different variations of breast reconstruction at the discretion of the treating surgeon, eliminating concerns for patient preference that are a major hindrance to RCTs in breast surgery.40 Multiple predefined primary outcomes were evaluated and multivariable logistic regression was used to control for confounders. In addition to eliminating practical hurdles of RCTs, advantages of cohort studies illustrated in this example include obtaining larger samples, evaluating multiple exposures and outcomes at once, and generating new hypotheses for future studies through an exploratory approach. Surgeons must also be aware of the pitfalls of nonrandomized studies. Without randomization, cohort studies have limited ability to account for confounders. A causal relationship is inferred from the chronologic sequence of exposures and outcomes, which is most convincing in prospective studies. Without blinding or allocation concealment, cohort studies are also susceptible to selection biases. Recall bias is a limitation for retrospective cohort studies because of factors such as inaccurate patient recall and reliance on information from patient charts.41 Variability in documentation practices makes the fidelity of large databases particularly concerning. In case-control studies, the inference of causation is especially problematic because comparison groups are selected based on their outcomes and analyzed retrospectively for exposures. Designing an observational study with a control group requires paying special attention to these inherent limitations and must be individualized to the research question. Exposures, outcomes, inclusion criteria, and statistical methods should all be planned in advance, even for studies meant to be “exploratory” or hypothesis-generating. Observational studies should be evaluated using standardized guidelines such as the Strengthening the Reporting of Observational Studies in Epidemiology framework.42,43 Retrospective cohort studies lend themselves to studying rare conditions and procedures, particularly with the availability of large national databases. Vast amounts of data can be extracted and analyzed rapidly. Meanwhile, case-control studies are useful for investigating factors underlying rare outcomes or complications. When using a registry or database, its purpose, completeness, and accuracy should be carefully evaluated.44 If the question would be better suited for an RCT and there are no major practical or ethical barriers, a higher level of evidence design should be pursued. CASE SERIES An appraisal of articles published in the general surgery literature in 1996 compared the preponderance of case series, 46% of all studies, to “comic opera,” pointing to a necessary shift toward higher quality research.6 A similar survey conducted 10 years later found the proportion of case series had decreased to 34%, with 51% of publications being cohort studies.5 Case series, which typically involve a retrospective account of outcomes from a single procedure, often from a single institution and surgeon, remain a common avenue for publishing novel information. The absence of a control group is their primary flaw, as a hypothesis cannot be tested.44 Retrospective case series are also culprits of publication bias, as new procedures with poor outcomes are unlikely to be published.45–47 Furthermore, failure to replicate favorable results of new techniques are unlikely to be published, raising concerns about the validity of a case series from a single institution. Case series are valuable for providing detailed analysis for subsequent hypothesis generation.44 They should be undertaken with specific goals, such as assessing the safety of a novel procedure or documenting rare occurrences or complications. The aforementioned IDEAL model considers the use of a case series to be appropriate in the idea or development stage, where first experiences with a new procedure are documented, a learning curve takes place, and refinements are made to the surgical technique.10,11 To circumvent some of the pitfalls discussed above, prospective series should be performed whenever possible, with protocols for the new procedure registered and approved in advance of conducting the study. Unsuccessful studies should also be registered to limit publication bias.48 The use of retrospective series should be limited, but at a minimum involve consecutive patients without exclusions and follow a standardized reporting protocol.11 As in the case of RCTs and observational studies, validated checklists can be used to minimize bias when reporting novel surgical interventions in a case series.49 A simple approach can be used to choose the best research design for a surgical question. Starting with consideration for the IDEAL model’s stage of the surgery being investigated, the highest level of inquiry appropriate for that stage should be evaluated for feasibility before considering designs at lower levels of the evidence pyramid. RCTs provide the highest level of rigor but are not suited for many questions. Other methodologies, such as observation studies and case series, prove to be valuable alternatives when practical and ethical considerations preclude the use of an RCT. Irrespective of the study type, surgical research should emphasize quality design and minimize bias. DISCLOSURE Dr. Chung receives funding from the National Institutes of Health, book royalties from Wolters Kluwer and Elsevier, and a research grant from Sonex to study carpal tunnel outcomes. Dr. Florczynski has no financial interest to declare in relation to the content of this article. ACKNOWLEDGMENTS The authors would like to thank Meghan Cichocki for edits and contributions to the development of this article. They appreciate the peer review and edits from Mike Stokes, staff vice-president of communications at the American Society of Plastic Surgeons.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call