Abstract

Abstract Introduction: Clinical research using genomic datasets, such as AACR Project GENIE, requires outcomes such as cancer progression and response to contextualize molecular information. We are developing the “PRISSMM” (Pathology, Radiology/Imaging, Signs/Symptoms, Medical oncologist assessment, and tumor Markers) framework for clinical curation of genomic data. Natural language processing (NLP) models based on this framework could accelerate curation of reproducible endpoints. However, the application of NLP at scale to extract outcomes from oncologist notes, which mix historical and current information, has been limited to date. Methods: Medical oncologists' progress notes were reviewed for patients with lung cancer whose tumors were sequenced through an institutional precision medicine study from 2013-2018. For each note, curators recorded whether the assessment/plan indicated the presence of (a) any cancer, (b) progression/worsening of disease, and/or (c) response to therapy/improvement of disease. Next, a recurrent neural network was trained to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on the assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated among a held-out 10% test subset of patients using the area under the receiver-operating characteristic curve (AUC) and area under the precision-recall curve (AUPRC). Associations between curated response or progression endpoints (generated using 10-fold cross-validation) and overall survival were measured using Cox models, treating the endpoints as time-varying covariates, among patients receiving palliative-intent systemic therapy. Results: Results among 7,597 curated notes for 919 patients are indicated in the Table. EndpointAUC of NLP models for identifying endpoint in the test setProportion of manually curated notes with endpointAUPRC of NLP models for identifying endpoint in the test setHR (95% CI) for mortality associated with endpoint, as manually curated, among patients receiving palliative- intent treatmentHR (95% CI) for mortality associated with endpoint, as predicted using NLP models using F1-optimal threshold probabilitiesAny evidence of lung cancer0.940.770.97N/AN/AProgression0.860.200.652.93 (2.33-3.67)2.49 (2.00-3.09)Response to treatment0.900.120.570.70 (0.47-1.03)0.45 (0.30-0.67) Conclusion: Neural network NLP models can extract meaningful outcomes from oncologist notes for clinical curation of electronic health records at scale. Citation Format: Kenneth L. Kehl, Wenxin Xu, Haitham A. Elmarakeby, Michael J. Hassett, Jackson Nyman, Bruce E. Johnson, Eliezer M. Van Allen, Deb Schrag. Deep natural language processing for automated ascertainment of cancer outcomes from clinician progress notes [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 2063.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call