Deep Learning-Based Natural Language Processing to Automate Esophagitis Severity Grading from the Electronic Health Records

S Chen,M Guevara,N Ramirez,H Aerts,T.A Miller,G.K Savova,R.H Mak,D.S Bitterman

doi:10.1016/j.ijrobp.2023.06.238

Abstract

Radiotherapy (RT) toxicities can impair survival and quality-of-life, yet their risk factors and optimal management are under-studied. Real-world evidence holds enormous potential to improve our understanding of RT adverse events, but this information is often only documented in clinic notes and cannot, at present, be automatically extracted. To address this unmet need, we developed natural language processing (NLP) algorithms to automatically identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. Our corpus consisted of (1) a gold-labeled dataset of 1524 clinic notes from 124 lung cancer patients treated with RT (in-domain), manually annotated for CTCAE v5.0 esophagitis grade, and (2) a silver-labeled dataset of 2420 notes from 1832 patients on whom toxicity grades had been collected as structured data during clinical care. We developed a fine-tuning pipeline for pre-trained BERT-based neural models for 3 tasks: 1) classifying the presence of esophagitis, 2) classifying grade 0-1 vs. > = 2 esophagitis and 3) classifying grade 0 vs. 1 vs. 2-3. A note sectionizer was used to let the model focus on the most informative sections. Independent validation in a separate clinical cohort of esophageal cancer patients was selected for out-of-domain transferability testing. Such cohorts consist of a manually annotated dataset of 345 notes from 75 esophageal cancer patients treated with RT. We also report patient-level results by evaluating the maximum predicted grade per patient. Fine-tuning PubmedBERT yielded the best-performing models. Performance is shown in the table. Selecting the most informative note sections (primarily Interval History, Assessment & Plan) during fine-tuning improved macro-F1 by > = 2% for all tasks. Including silver-labeled data improved the macro-F1 by > = 3% across all tasks. To the best of our knowledge, this is the first effort to automatically extract toxicity severity according to CTCAE guidelines from clinic notes, providing proof-of-concept for NLP to support detailed toxicity reporting. Fine-tuning on note sections and leveraging silver-labeled data enabled promising performance despite small datasets, informing future research into NLP for automated toxicity monitoring. Future work will extend these methods to other cancer diagnoses and toxicities, and to toxicity risk prediction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Learning-Based Natural Language Processing to Automate Esophagitis Severity Grading from the Electronic Health Records

Abstract

Talk to us

Similar Papers

More From: International Journal of Radiation OncologyBiologyPhysics

Lead the way for us

Journal: International Journal of Radiation OncologyBiologyPhysics	Publication Date: Sep 29, 2023
Citations: 1

Similar Papers

Using deep learning-based natural language processing to identify reasons for statin nonuse in patients with atherosclerotic cardiovascular disease
Ashish Sarraju ... Fatima Rodriguez
Communications medicine | VOL. 2
Ashish Sarraju, et. al.Ashish Sarraju ... Fatima Rodriguez
15 Jul 2022
Communications medicine | VOL. 2

Deep learning approach to detection of colonoscopic information from unstructured reports
Donghyeong Seong ... Yoon Ho Choi
BMC Medical Informatics and Decision Making | VOL. 23
Donghyeong Seong, et. al.Donghyeong Seong ... Yoon Ho Choi
07 Feb 2023
BMC Medical Informatics and Decision Making | VOL. 23

Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy.
Shan Chen ... Marco Guevara
JCO Clinical Cancer Informatics | VOL. 7
Shan Chen, et. al.Shan Chen ... Marco Guevara
01 Jul 2023
JCO Clinical Cancer Informatics | VOL. 7

Wave2Vec: Vectorizing Electroencephalography Bio-Signal for Prediction of Brain Disease.
Seonho Kim ... Hong-Woo Chun
International Journal of Environmental Research and Public Health | VOL. 15
Seonho Kim, et. al.Seonho Kim ... Hong-Woo Chun
01 Aug 2018
International Journal of Environmental Research and Public Health | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Learning-Based Natural Language Processing to Automate Esophagitis Severity Grading from the Electronic Health Records

Abstract

Talk to us

Similar Papers

More From: International Journal of Radiation Oncology*Biology*Physics

More From: International Journal of Radiation OncologyBiologyPhysics