Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods

Zehao Yu,Yinghan Ma,Xi Yang,Ruogu Fang,Skylar E Stolte,Gianna L Sweeting,Yonghui Wu

doi:10.1186/s12911-022-01996-2

Zehao Yu, Yinghan Ma + Show 5 more

Open Access

https://doi.org/10.1186/s12911-022-01996-2

Copy DOI

Abstract

BackgroundDiabetic retinopathy (DR) is a leading cause of blindness in American adults. If detected, DR can be treated to prevent further damage causing blindness. There is an increasing interest in developing artificial intelligence (AI) technologies to help detect DR using electronic health records. The lesion-related information documented in fundus image reports is a valuable resource that could help diagnoses of DR in clinical decision support systems. However, most studies for AI-based DR diagnoses are mainly based on medical images; there is limited studies to explore the lesion-related information captured in the free text image reports.MethodsIn this study, we examined two state-of-the-art transformer-based natural language processing (NLP) models, including BERT and RoBERTa, compared them with a recurrent neural network implemented using Long short-term memory (LSTM) to extract DR-related concepts from clinical narratives. We identified four different categories of DR-related clinical concepts including lesions, eye parts, laterality, and severity, developed annotation guidelines, annotated a DR-corpus of 536 image reports, and developed transformer-based NLP models for clinical concept extraction and relation extraction. We also examined the relation extraction under two settings including ‘gold-standard’ setting—where gold-standard concepts were used–and end-to-end setting.ResultsFor concept extraction, the BERT model pretrained with the MIMIC III dataset achieve the best performance (0.9503 and 0.9645 for strict/lenient evaluation). For relation extraction, BERT model pretrained using general English text achieved the best strict/lenient F1-score of 0.9316. The end-to-end system, BERT_general_e2e, achieved the best strict/lenient F1-score of 0.8578 and 0.8881, respectively. Another end-to-end system based on the RoBERTa architecture, RoBERTa_general_e2e, also achieved the same performance as BERT_general_e2e in strict scores.ConclusionsThis study demonstrated the efficiency of transformer-based NLP models for clinical concept extraction and relation extraction. Our results show that it’s necessary to pretrain transformer models using clinical text to optimize the performance for clinical concept extraction. Whereas, for relation extraction, transformers pretrained using general English text perform better.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC medical informatics and decision making	Publication Date: Sep 27, 2022
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods

Abstract

Talk to us

Similar Papers

More From: BMC medical informatics and decision making

Lead the way for us

Similar Papers

Clinical concept and relation extraction using prompt-based machine reading comprehension.
Cheng Peng ... Zehao Yu
Journal of the American Medical Informatics Association : JAMIA | VOL. 30
Cheng Peng, et. al.Cheng Peng ... Zehao Yu
14 Jun 2023
Journal of the American Medical Informatics Association : JAMIA | VOL. 30

Identify Diabetic Retinopathy-related Clinical Concepts Using Transformer-based Natural Language Processing Methods
Zehao Yu ... Ruogu Fang
-
Zehao Yu, et. al.Zehao Yu ... Ruogu Fang
01 Aug 2021
01 Aug 2021

Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models.
Xi Yang ... Yonghui Wu
JMIR Medical Informatics | VOL. 8
Xi Yang, et. al.Xi Yang ... Yonghui Wu
23 Nov 2020
JMIR Medical Informatics | VOL. 8

ER-LAC: Span-Based Joint Entity and Relation Extraction Model with Multi-Level Lexical and Attention on Context Features
Yaqin Zhu ... Cairong Yan
Applied Sciences | VOL. 13
Yaqin Zhu, et. al.Yaqin Zhu ... Cairong Yan
21 Sep 2023
Applied Sciences | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods

Abstract

Talk to us

Similar Papers

More From: BMC medical informatics and decision making