Editorial: When "safe and effective" becomes dangerous.

Seth S Leopold

doi:10.1007/s11999-014-3675-x

Abstract

Commercials might claim a product is “safe and effective,” but most research studies should not. Small, focused studies may deem one treatment more effective than another, but the problem arises when authors of such studies claim safety based on the observation that few (or no) patients were hurt by some intervention. Such claims may be misleading. It is almost impossible to evaluate safety and efficacy in the same study. There are many reasons for this, but the key ones are (1) the elements of study design one needs to evaluate efficacy are different from those to evaluate safety, and (2) demonstrating safety requires evaluation of many more patients than does demonstrating efficacy. There are several important study-design differences between safety and efficacy studies. Whereas efficacy studies can focus on specific endpoints like “Does ligament reconstruction decrease the likelihood of subsequent inversion injury to the tibiotalar joint?”, safety studies must be open to the possibility that many different kinds of adverse effects might occur, not all of which will be immediately evident at the time of treatment or even at the conclusion of an efficacy study. For example, the discovery of systemic effects of local procedures like THA [10], in particular THAs with metal-on-metal bearings [6], drug interactions or unexpected complications from pharmacologic treatments [12], and unanticipated modes of failure [3] all have changed our views about potentially promising treatments, in some cases, even after shorter-term efficacy trials have immodestly claimed safety. This last point is important: While efficacy can be demonstrated quickly, we often do not learn about the harms our interventions cause until much later. Because serious complications of our treatments are generally and thankfully uncommon, safety studies must be much larger in order to have a fair likelihood of detecting them. For example, if we examine the studies supporting the multimodal analgesia approaches now in common use after orthopaedic surgery, we note they tend to have two things in common: They apply at least several classes of medications to a population, and they almost always are small [4, 11]. Because the study populations often include older patients, many of whom also take other medications, we take a risk when we infer safety from small studies designed to evaluate efficacy, and apply a complex protocol to complex population on a large scale. One study [11] on the efficacy of a particular multimodal analgesia approach, which also claimed it “confirmed the safety” of its protocol, prescribed at least five drug classes (including two different NSAIDs), and involved a cocktail containing drugs from three classes, which was injected into six different kinds of tissue around the knee. This study was powered to detect a clinically important difference in patients’ pain, and with a total of only 42 patients, was able to detect such a difference. But with fewer than four dozen patients, efficacy studies of this sort should make no claims about safety; contrast this with a key meta-analysis that concluded the NSAID rofecoxib (Vioxx) was unsafe — that study required data from over 20,000 patients [2] in order to draw definitive conclusions, and it still was controversial [5]. Registries offer another potential window into the safety of some of the tools we use; they do this by accessing data from large populations of patients [1]. The postmarketing surveillance required by the FDA includes a database consisting of hundreds of thousands of new reports of confirmed or possible device-associated serious illnesses, deaths, and malfunctions drawn from the experiences of millions of patients every year [8]. While the FDA has definitions for the kinds of evidence required to declare a device to be “safe” [9], we also now know that many devices thus vetted turn out not to be safe at all [7], placing the burden back on us, as clinicians, to know the difference between a study that demonstrates safety and one that demonstrates efficacy. It’s important to remember, though, that studies whose size, scope, and duration genuinely permit answering safety questions generally do so at the expense of patient-level detail about efficacy. To evaluate hip scores after femoroacetabular impingement surgery, the likelihood of return to sport after shoulder arthroscopy, or range of motion after basilar joint arthroplasty of the thumb, smaller trials often suffice, and may allow for a more granular examination of the dataset. Questions like those often can be answered by studies enrolling anywhere between a few dozen and a couple hundred patients. But if small studies of efficacy fail to identify any patients who were harmed by the intervention, one should not conclude that those interventions are safe. Safety and efficacy both are important, but evaluating each requires a different kind of study. Beware of studies that, like commercials, claim a treatment to be both “safe and effective.”

Full Text