Abstract

e21102 Background: To assess drug efficacy in clinical trials with imaging endpoints, the FDA recommends blinded independent central review (BICR) with double reads to ensure robust image interpretation. However, discordance created by double-reading adds burden and costs that stakeholders strive to minimize. Given that selection of tumors at baseline is reported to cause one third of adjudications, we tested a set of risk factors arising from baseline evaluations and trained models likely to predict discrepancies in RECIST responses. Methods: We pooled and retrospectively analyzed data from five lung cancer clinical trials that used RECIST with BICR and double reads including a total of 1720 patients. Firstly, we analyzed the distribution of four kinds of response discrepancies (KoD): progression or date of progression (PD), response or date of response (OR), progression or response (ANY) and best response (BR). We derived discrepancy rates, average time-point of the first occurrence of KoD and the proportion of time-points before KoD out of total time-point number (compared using Marascuilo procedure) as key indicators of the at-risk period of discordance. Secondly, from the baseline evaluations performed by readers, we computed the odds ratios of 32 risk factors for discrepancies including differences in measurements, tumor selection and disease locations. Finally, based on 77 features, we trained a Random Forest (RF) and a deep learning (DL) model to predict KoDs. Results: On average, respectively for PD, OR, ANY and BR, discrepancy rates were 41.5%, 49.1%, 66.3% and 28.7%; number of time-points for first occurrence were 4.6, 2.7, 2.7 and 4.8; and the proportion of time-points before KoD were 30.7%, 24.7%, 33.1% and 23.2% (significantly different). The main risk factors for discrepancies were miss-detection of measurable disease, completely different disease selection, miss-selection of lung tumors and when readers’ difference in sum of diameters was between 10% and 20%. However, associations were weak, with no association when one of the readers selected infrequent diseases (e.g., skin, gastric). DL outperformed RF. Classification performances were generally poor. PD was not predictable. BR performances were positive predictive value: 81.0 [95% CI: 78.8; 83.2], negative predictive value: 73.3 [95% CI: 72.8; 73.8]. Conclusions: The KoDs have different distributions, but all occur in the first third of the patient evaluation defining a risk period. We confirmed the reported association of selection and measurement at baseline with responses' variabilities. However, the associations were weak and did not allow good prediction of the variability in responses. For BICR of lung cancer trials using RECIST, our results support the implementation of a monitoring of variability focusing on the baseline and the beginning of the evaluation of the patients.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call