Abstract

Pulmonary nodules and nodule characteristics are important indicators of lung nodule malignancy. However, nodule information is often documented as free text in clinical narratives such as radiology reports in electronic health record systems. Natural language processing (NLP) is the key technology to extract and standardize patient information from radiology reports into structured data elements. This study aimed to develop an NLP system using state-of-the-art transformer models to extract pulmonary nodules and associated nodule characteristics from radiology reports. We identified a cohort of 3080 patients who underwent LDCT at the University of Florida health system and collected their radiology reports. We manually annotated 394 reports as the gold standard. We explored eight pretrained transformer models from three transformer architectures including bidirectional encoder representations from transformers (BERT), robustly optimized BERT approach (RoBERTa), and A Lite BERT (ALBERT), for clinical concept extraction, relation identification, and negation detection. We examined general transformer models pretrained using general English corpora, transformer models fine-tuned using a clinical corpus, and a large clinical transformer model, GatorTron, which was trained from scratch using 90 billion words of clinical text. We compared transformer models with two baseline models including a recurrent neural network implemented using bidirectional long short-term memory with a conditional random fields layer and support vector machines. RoBERTa-mimic achieved the best F1-score of 0.9279 for nodule concept and nodule characteristics extraction. ALBERT-base and GatorTron achieved the best F1-score of 0.9737 in linking nodule characteristics to pulmonary nodules. Seven out of eight transformers achieved the best F1-score of 1.0000 for negation detection. Our end-to-end system achieved an overall F1-score of 0.8869. This study demonstrated the advantage of state-of-the-art transformer models for pulmonary nodule information extraction from radiology reports.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.