Abstract

BackgroundIn recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. Clinical Event Detection (CED) as its subtask has attracted the attention from academia and industry. However, directly applying the advancements in deep learning to CED task often yields unsatisfactory results. The main reasons are due to the following two points: (1) A great number of obscure professional terms in the electronic medical record leads to poor recognition performance of model. (2) The scarcity of datasets required for the task leads to poor model robustness. Therefore, it is urgent to solve these two problems to improve model performance.MethodsThis paper proposes a combining data augmentation and domain information with TENER Model for Clinical Event Detection.ResultsWe use two evaluation metrics to compare the overall performance of the proposed model with the existing model on the 2012 i2b2 challenge dataset. Experimental results demonstrate that our proposed model achieves the best F1-score of 80.26%, type accuracy of 93% and Span F1-score of 90.33%, and outperforms the state-of-the-art approaches.ConclusionsThis paper proposes a multi-granularity information fusion encoder-decoder framework, which applies the TENER model to the CED task for the first time. It uses the pre-trained language model (BioBERT) to generate word-level features, solving the problem of a great number of obscure professional terms in the electronic medical record lead to poor recognition performance of model. In addition, this paper proposes a new data augmentation method for sequence labeling tasks, solving the problem of the scarcity of datasets required for the task leads to poor model robustness.

Highlights

  • In recent years, with the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend

  • This paper proposes a new data augmentation method for sequence labeling tasks, solving the problem of the scarcity of datasets required for the task leads to poor model robustness

  • The word-lever embedding generated by the pre-trained language model in the medical field (BioBERT) and the character-lever embedding generated by the Transformer are merged and used as the input of the encoder

Read more

Summary

Introduction

With the development of artificial intelligence, the use of deep learning technology for clinical information extraction has become a new trend. The task of Clinical Event Detection is to identify the boundary of the event in the electronic medical record and determine its type. The event detection to identify the boundary and determine type is usually considered as a sequence labeling task. The task of Clinical Event Detection (CED) and named entity recognition (NER) belong to the sequence labeling task. It is feasible to directly apply the advancement of NER technology to Clinical Event Detection tasks. Zhu et al [7] proposed a bidirectional LSTM-CRF model is trained for clinical concept extraction using the contextual word embedding model, it achieved the best performance among reported baseline models on the i2b2 2010 challenge dataset. The LSTM and CRF models greatly improve the performance of the NER task [8]. Chen et al [9] proposed a simple but effective CNNbased network for NER, gated relation network (GRN), which is more capable than common CNNs in capturing long-term context. graph neural networks (GNNs) are widely used in NER tasks [10]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call