Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.

Weisong Liu,Bhanu Pratap Singh Rawat,Yonghao Jin,Pengshan Cai,Fei Li,Hong Yu

doi:10.2196/14830

Abstract

BackgroundThe bidirectional encoder representations from transformers (BERT) model has achieved great success in many natural language processing (NLP) tasks, such as named entity recognition and question answering. However, little prior work has explored this model to be used for an important task in the biomedical and clinical domains, namely entity normalization.ObjectiveWe aim to investigate the effectiveness of BERT-based models for biomedical or clinical entity normalization. In addition, our second objective is to investigate whether the domains of training data influence the performances of BERT-based models as well as the degree of influence.MethodsOur data was comprised of 1.5 million unlabeled electronic health record (EHR) notes. We first fine-tuned BioBERT on this large collection of unlabeled EHR notes. This generated our BERT-based model trained using 1.5 million electronic health record notes (EhrBERT). We then further fine-tuned EhrBERT, BioBERT, and BERT on three annotated corpora for biomedical and clinical entity normalization: the Medication, Indication, and Adverse Drug Events (MADE) 1.0 corpus, the National Center for Biotechnology Information (NCBI) disease corpus, and the Chemical-Disease Relations (CDR) corpus. We compared our models with two state-of-the-art normalization systems, namely MetaMap and disease name normalization (DNorm).ResultsEhrBERT achieved 40.95% F1 in the MADE 1.0 corpus for mapping named entities to the Medical Dictionary for Regulatory Activities and the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-CT), which have about 380,000 terms. In this corpus, EhrBERT outperformed MetaMap by 2.36% in F1. For the NCBI disease corpus and CDR corpus, EhrBERT also outperformed DNorm by improving the F1 scores from 88.37% and 89.92% to 90.35% and 93.82%, respectively. Compared with BioBERT and BERT, EhrBERT outperformed them on the MADE 1.0 corpus and the CDR corpus.ConclusionsOur work shows that BERT-based models have achieved state-of-the-art performance for biomedical and clinical entity normalization. BERT-based models can be readily fine-tuned to normalize any kind of named entities.

Highlights

BackgroundEntity normalization (EN) is the process of mapping a named entity mention to a term in a controlled vocabulary [1]
We proposed the following objectives: 1. We aimed to explore the effectiveness of bidirectional encoder representations from transformers (BERT)-based models for the entity normalization task in the biomedical and clinical domains
By comparing BERT, BioBERT, and EhrBERT, we found that the domain influences the performances of BERT-based models

Summary

Background

Entity normalization (EN) is the process of mapping a named entity mention (eg, dyspnea on exertion) to a term (eg, 60845006: Dyspnea on exertion) in a controlled vocabulary (eg, Systematized Nomenclature of Medicine—Clinical Terms [SNOMED-CT]) [1] It is a significant task for natural language processing (NLP) [2]. Deep representation-learning models, such as bidirectional encoder representations from transformers (BERT) and embeddings from language models (ELMo), have been shown to improve many NLP tasks [11,12]. These studies usually employ unsupervised pretraining techniques to learn language representations from large-scale raw text. Little work has explored such models in biomedical and clinical entity normalization tasks

Related Work

Objective

Methods

A BERT-Based Model Trained on Electronic Health Record Notes

Results

Principal Findings

Limitations

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: JMIR medical informatics	Publication Date: Sep 12, 2019
Citations: 115	License type: cc-by

R Discovery Prime

R Discovery Prime

Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics

Lead the way for us

Similar Papers

Bidirectional encoders to state-of-the-art: a review of BERT and its transformative impact on natural language processing
Rajesh Gupta
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3
Rajesh GuptaRajesh Gupta
02 Mar 2024
Информатика. Экономика. Управление - Informatics. Economics. Management | VOL. 3

Relation Classification for Bleeding Events From Electronic Health Records Using Deep Learning Systems: An Empirical Study
Bhanu Pratap Singh Rawat ... Avijit Mitra
JMIR Medical Informatics | VOL. 9
Bhanu Pratap Singh Rawat, et. al.Bhanu Pratap Singh Rawat ... Avijit Mitra
02 Jul 2021
JMIR Medical Informatics | VOL. 9

An ERNIE-Based Joint Model for Chinese Named Entity Recognition
Zuchang Ma ... Yu Wang
Applied Sciences | VOL. 10
Zuchang Ma, et. al.Zuchang Ma ... Yu Wang
18 Aug 2020
Applied Sciences | VOL. 10

Bert model fine-tuning for text classification in knee OA radiology reports
R Shah ... L Chen
Osteoarthritis and Cartilage | VOL. 28
R Shah, et. al.R Shah ... L Chen
01 Apr 2020
Osteoarthritis and Cartilage | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: JMIR medical informatics