Abstract
11000 Background: Clinical outcomes such as response, progression, and metastasis represent crucial data for observational cancer research, but outside of clinical trials, such outcomes are usually recorded only in unstructured notes in electronic health records (EHRs). Manual EHR annotation is too resource-intensive to scale to large datasets. Individual cancer centers have trained artificial intelligence natural language processing (AI/NLP) models to extract outcomes from their EHRs. However, due to concerns that models trained on protected health information (PHI) might encode private data, such models usually cannot be exported to other centers. Methods: EHR data from Dana-Farber Cancer Institute (DFCI) and Memorial-Sloan Kettering (MSK) collected through the AACR Project GENIE Biopharma Collaborative were used to train and evaluate Bidirectional Encoder Representations from Transformers (BERT)-based NLP models to extract outcomes from imaging reports and oncologist notes annotated with the Pathology, Radiology/Imaging, Signs/Symptoms, Medical oncologist, and bioMarkers (PRISSMM) framework. Document-level outcomes included the presence of active cancer; response; progression; and metastatic sites. ‘Teacher’ models trained on DFCI EHR data were used to label imaging reports and discharge summaries from the public MIMIC-IV dataset. ‘Student’ models trained to use MIMIC documents to predict teacher labels were transferred to MSK for evaluation. Results: Teacher models were trained at DFCI on 30,332 imaging reports for 2609 patients and 32,173 oncologist notes for 2917 patients with non-small cell lung, colorectal, breast, prostate, pancreatic, or urothelial cancer. The models were used to label 217,642 imaging reports and 141,377 discharge summaries from MIMIC for student model training. These DFCI-trained student models were evaluated at MSK, demonstrating high discrimination (AUROC > 0.90) across outcomes. Conclusions: This privacy-preserving “teacher-student” AI/NLP framework could expedite linkage of genomic data to clinical outcomes across institutions, accelerating precision cancer research. [Table: see text]
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.