Abstract

Named entity recognition (NER) is an important task in natural language processing (NLP). In recent years, NER has attracted much attention in the biomedical field. However, due to the lack of biomedical named entity identification datasets, the complexity and rarity of biomedical named entities and so on, biomedical NER is more difficult than general domain NER. So in this paper, we propose a framework (MMBERT) based on Transformer to solve the problems above. To address the scarcity of biomedical named entity recognition datasets, we introduce ERNIE-Health, a new Chinese language representation model pre-trained on large-scale biomedical text corpora. Because of the complexity and rarity of biomedical named entities, we use the Bert and CW-LSTM structures to get the joint feature vector of word pairs relations. In addition, we design multi-granularity 2D convolution to refine the relationship and representation between word pairs. Finally, we design a convolutional neural network (CNN) structure and a co-predictor to improve the model's generalization capability and prediction accuracy. We have conducted extensive experiments on three benchmark datasets, and the experimental results show that our model achieves the best results compared with several baseline models in the experiment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call