Feasibility of Using Zero-Shot Learning in Transformer-Based Natural Language Processing Algorithm for Key Information Extraction from Head and Neck Tumor Board Notes

S Zhu,M Gilbert,A.I Ghanem,F Siddiqui,K Thind

doi:10.1016/j.ijrobp.2023.06.1743

Abstract

Natural language processing (NLP) technology has the potential to automate information aggregation and summarization in oncology. One example is the automation of patient registry creation. In this work, we aim to show (1) the feasibility of using modern NLP algorithms to extract key information from tumor board notes, and (2) the impact of prompt engineering on the quality of the results. In this IRB-approved study, we obtained the texts of head and neck tumor board notes for 306 unique patients. Five key pieces of information used to create a patient registry were predefined: age, gender, tumor histology, tumor stage, and primary location. The NLP algorithm used was a modified Text-To-Text Transfer Transformer (T5) model that was initially trained on the Colossal Clean Crawled Corpus (C4) dataset and subsequently fine-tuned on the Stanford Question Answering Dataset (SQuAD) to perform the downstream task of extractive question answering. The NLP model and trained weights were obtained from the Hugging Face platform. During inference, the entire body of the tumor board note and a related question were fed as inputs, and the model predicted a sequence of texts in response to the question. Two sets of questions of similar semantic meanings were used. Questions in prompt set #1 included "What is the gender?", "What is the age?", "What is the type of carcinoma in pathological diagnosis?", "What is the stage?", and "Where is the carcinoma located at?". Questions in prompt set #2 include "Is the patient male or female?", "How old is the patient?", "What kind of cancer?", "What is the cancer stage?", and "What is the tumor location?". Each model-predicted response was compared to the ground truth extracted from the tumor board notes. A response was classified as true if it is consistent with the ground truth, otherwise, it was deemed false. The response accuracy for each question was subsequently calculated. The median number of words in each tumor board note was 448 (range, 219 - 1505). The accuracy of the NLP algorithm for each question from either set is reported in Table 1. Algorithm performance is higher for extracting objective information such as age, gender, and histology. In addition, it was found that questions of similar semantic meanings but with different wording can lead to significantly different results. We demonstrated that a transformer-based extractive question-answering NLP algorithm can be successfully used for extracting information from head and neck tumor board notes with zero-shot learning. Furthermore, our results highlight the significance of prompt engineering for applying NLP for this task. Future work on finetuning these algorithms to oncology-specific texts can potentially enhance algorithm performance for more difficult tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Feasibility of Using Zero-Shot Learning in Transformer-Based Natural Language Processing Algorithm for Key Information Extraction from Head and Neck Tumor Board Notes

Abstract

Talk to us

Similar Papers

More From: International Journal of Radiation OncologyBiologyPhysics

Lead the way for us

Journal: International Journal of Radiation OncologyBiologyPhysics	Publication Date: Sep 29, 2023
Citations: 2

Similar Papers

Natural Language Processing and Machine Learning for Detection of Respiratory Illness by Chest CT Imaging and Tracking of COVID-19 Pandemic in the US.
Ricardo C Cury ... Juan Batlle
Radiology: Cardiothoracic Imaging | VOL. 3
Ricardo C Cury, et. al.Ricardo C Cury ... Juan Batlle
01 Feb 2021
Radiology: Cardiothoracic Imaging | VOL. 3

Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies
Martijn G Kersloot ... Derk L Arts
Journal of biomedical semantics | VOL. 11
Martijn G Kersloot, et. al.Martijn G Kersloot ... Derk L Arts
16 Nov 2020
Journal of biomedical semantics | VOL. 11

Identification of recurrent atrial fibrillation using natural language processing applied to electronic health records.
Chengyi Zheng ... Jaejin An
European Heart Journal - Quality of Care and Clinical Outcomes | VOL. 10
Chengyi Zheng, et. al.Chengyi Zheng ... Jaejin An
30 Mar 2023
European Heart Journal - Quality of Care and Clinical Outcomes | VOL. 10

Natural language processing of radiology reports for identification of skeletal site-specific fractures
Yanshan Wang ... Saeed Mehrabi
BMC Medical Informatics and Decision Making | VOL. 19
Yanshan Wang, et. al.Yanshan Wang ... Saeed Mehrabi
01 Apr 2019
BMC Medical Informatics and Decision Making | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feasibility of Using Zero-Shot Learning in Transformer-Based Natural Language Processing Algorithm for Key Information Extraction from Head and Neck Tumor Board Notes

Abstract

Talk to us

Similar Papers

More From: International Journal of Radiation Oncology*Biology*Physics

More From: International Journal of Radiation OncologyBiologyPhysics