Abstract

For nearly three decades, computed tomography (CT) imaging has been a critical assessment tool for cancer therapeutics. In the article that accompanies this editorial, Oxnard et al report an unprecedented study on performance characteristics of CT. The results have important implications for the future of drug development, but our outdated systems for reporting the status of solid tumors challenged the investigators to explain how this information on CT can best be used. The Response Evaluation Criteria in Solid Tumors (RECIST) that defines the terminology of response and progression constrained the authors’ discussion of how their data can best advance our understanding of malignant disease in humans. For early development of new cancer therapeutics, it is time to replace these systems with more innovative, quantitative approaches that have the potential to define relationships between solid tumors, disease progression, and therapeutic outcomes in patients. This study determined the intermeasurement variance of CT for primary malignant lung lesions. Thirty patients with non–small-cell lung cancer (NSCLC) underwent noncontrast CT, exited the scanner, and were reimaged on the same scanner after a brief interlude. Images from both scans were presented blindly (without mention of the length of time between image acquisitions) to three radiologists. The radiologists manually measured the longest dimension of the target lesions on the two different scans with standard software. Lesions ranged from 1 to 8 cm in size. The absolute difference between scan measurements of single lesions ranged from 0 to 9 mm; the large absolute differences were observed with the largest lesions, and the greatest fractional differences were observed with the smallest lesions. Oxnard et al then examined the potential impact of the measurement variance in this sample by simulating 1,000 replicates with standard statistical software and summarized the results by binning the lesions into different size groups; they then reported standard deviations and 95% limits of agreement for each bin. For example, the authors report the standard deviation for lesions between 3 and 5 cm to be 2.3 mm. Therefore, they estimated that a 4-cm lesion has a measured range as a result of intermeasurement variance alone as broad as 3.5 to 4.5 cm (corresponding to 12% change because of measurement variability). Although aberrant assessments of partial response and progressive disease can occur as a result of measurement variance alone, the RECIST thresholds actually keep these events at low frequency. The signal-to-noise ratio of tumor measurements, the problem addressed by this investigation, has major ramifications for the early development of new cancer therapeutics, especially phase II trials. In early clinical development there are two goals: first, to learn as much as possible about how best to administer the new agent to achieve clear and consistent benefit to patients (learning); and second, to determine whether additional development of the drug or drug combination is likely to lead to net benefit to patients (determining). When a drug causes obvious and sustained tumor shrinkage in multiple patients or clearly has no evidence of benefit and obvious toxicity, the decision for additional development is simple. RECIST is a categorical system of treatment assessment that provides a uniform structure with which to screen for obviously poor results and confirm obviously good results across institutions internationally. Categorical systems assess anticancer activity in terms of response or progression and control the noise of tumor measurement imprecision by setting thresholds for change in total tumor size that are beyond typical measurement variance. These thresholds are based on studies from the 1970s of interexaminer variance using physical exam techniques, when few solid tumors had any established standard of care, most drugs were developed specifically to shrink tumors by killing cells rapidly, and our biotechnical capacity for establishing why a drug did or did not work was limited. In optimizing single-arm trials, the use of categorical assessments of tumor response was a reasonable, simple, efficient approach to the early development of anticancer drugs. With current digital imaging and computer processing technology and ready access to CT, early cancer therapeutics development is poised for rapid, radical change to improve efficiency in learning and determining. As Oxnard et al mention, to generate some evidence of likelihood for net benefit, “phase II trials ... increasingly present waterfall plots showing individual measurement changes for each patient.” In the first publication of a phase II trial with a waterfall plot, the intent was to demonstrate that a drug that had an acceptable RECIST response rate clearly had additional disease-controlling effects in patients who were classified as nonresponders. As a randomized discontinuation trial, the waterfall plot was an effort to illuminate what was learned about the drug’s anticancer activity. The waterfall plot represents a limited effort to treat CT measurements as a continuous rather than JOURNAL OF CLINICAL ONCOLOGY E D I T O R I A L S VOLUME 29 NUMBER 23 AUGUST 1

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call