Abstract

BackgroundMedical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity of the Chinese language and annotated corpora, these methods are based on simple neural networks, which cannot effectively extract the deep semantic representations of electronic medical records (EMRs) and be used on the scarce medical corpora. We thus developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition.ObjectiveThis study aimed to improve the performance of the language model by having it learn multi-level representation and recognize Chinese medical entities.MethodsIn this paper, the pretraining language representation model was investigated; utilizing information not only from the final layer but from intermediate layers was found to affect the performance of the Chinese medical entity recognition task. Therefore, we proposed a multi-level representation learning model for entity recognition in Chinese EMRs. Specifically, we first used the BERT language model to extract semantic representations. Then, the multi-head attention mechanism was leveraged to automatically extract deeper semantic information from each layer. Finally, semantic representations from multi-level representation extraction were utilized as the final semantic context embedding for each token and we used softmax to predict the entity tags.ResultsThe best F1 score reached by the experiment was 82.11% when using the CEMR dataset, and the F1 score when using the CCKS (China Conference on Knowledge Graph and Semantic Computing) 2018 benchmark dataset further increased to 83.18%. Various comparative experiments showed that our proposed method outperforms methods from previous work and performs as a new state-of-the-art method.ConclusionsThe multi-level representation learning model is proposed as a method to perform the Chinese EMRs entity recognition task. Experiments on two clinical datasets demonstrate the usefulness of using the multi-head attention mechanism to extract multi-level representation as part of the language model.

Highlights

  • Electronic medical records (EMRs) comprise patients’ health information

  • We found that the proposed model is better than state-of-the-art baseline methods, with F1 score Global Vectors (GloVe) (F1) scores of 0.94% to 4.9%

  • Our multi-level entity recognition (ER) learning model had improved by 1.48% in its P value, 0.47% in its R value, and 0.94% in its F1 score compared to the Bidirectional Encoder Representation from Transformers (BERT) model

Read more

Summary

Introduction

Background Electronic medical records (EMRs) comprise patients’ health information. Diagnostic accuracy can be improved by making full use of the available information in EMRs. Research studies have demonstrated that using embedding techniques can help solve the problem of missing supervised data in NLP tasks, including the factorization methods of Global Vectors (GloVe) [9], the neural methods of word2vec [10] and fastText [11], and more recent dynamic methods that take into account the context, such as Embeddings from Language Models (ELMo) [12] and OpenAI Generative Pre-trained Transformer (GPT) [13]. We developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition. Conclusions: The multi-level representation learning model is proposed as a method to perform the Chinese EMRs entity recognition task. Experiments on two clinical datasets demonstrate the usefulness of using the multi-head attention mechanism to extract multi-level representation as part of the language model

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.