Abstract
Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet their risk factors and optimal management are under-studied. Real-world evidence holds enormous potential to improve our understanding of RT adverse events, but this information is often only documented in clinic notes and cannot, at present, be automatically extracted. To address this unmet need, we developed natural language processing (NLP) algorithms to automatically identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. Our corpus consisted of (1) a gold-labeled dataset of 1524 clinic notes from 124 lung cancer patients treated with RT (in-domain), manually annotated for CTCAE v5.0 esophagitis grade, and (2) a silver-labeled dataset of 2420 notes from 1832 patients on whom toxicity grades had been collected as structured data during clinical care. We developed a fine-tuning pipeline for pre-trained BERT-based neural models for 3 tasks: 1) classifying the presence of esophagitis, 2) classifying grade 0-1 vs. > = 2 esophagitis and 3) classifying grade 0 vs. 1 vs. 2-3. A note sectionizer was used to let the model focus on the most informative sections. Independent validation in a separate clinical cohort of esophageal cancer patients was selected for out-of-domain transferability testing. Such cohorts consist of a manually annotated dataset of 345 notes from 75 esophageal cancer patients treated with RT. We also report patient-level results by evaluating the maximum predicted grade per patient. Fine-tuning PubmedBERT yielded the best-performing models. Performance is shown in the table. Selecting the most informative note sections (primarily Interval History, Assessment & Plan) during fine-tuning improved macro-F1 by > = 2% for all tasks. Including silver-labeled data improved the macro-F1 by > = 3% across all tasks. To the best of our knowledge, this is the first effort to automatically extract toxicity severity according to CTCAE guidelines from clinic notes, providing proof-of-concept for NLP to support detailed toxicity reporting. Fine-tuning on note sections and leveraging silver-labeled data enabled promising performance despite small datasets, informing future research into NLP for automated toxicity monitoring. Future work will extend these methods to other cancer diagnoses and toxicities, and to toxicity risk prediction.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Radiation Oncology*Biology*Physics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.