Abstract

BackgroundFinding biomedical named entities is one of the most essential tasks in biomedical text mining. Recently, deep learning-based approaches have been applied to biomedical named entity recognition (BioNER) and showed promising results. However, as deep learning approaches need an abundant amount of training data, a lack of data can hinder performance. BioNER datasets are scarce resources and each dataset covers only a small subset of entity types. Furthermore, many bio entities are polysemous, which is one of the major obstacles in named entity recognition.ResultsTo address the lack of data and the entity type misclassification problem, we propose CollaboNet which utilizes a combination of multiple NER models. In CollaboNet, models trained on a different dataset are connected to each other so that a target model obtains information from other collaborator models to reduce false positives. Every model is an expert on their target entity type and takes turns serving as a target and a collaborator model during training time. The experimental results show that CollaboNet can be used to greatly reduce the number of false positives and misclassified entities including polysemous words. CollaboNet achieved state-of-the-art performance in terms of precision, recall and F1 score.ConclusionsWe demonstrated the benefits of combining multiple models for BioNER. Our model has successfully reduced the number of misclassified entities and improved the performance by leveraging multiple datasets annotated for different entity types. Given the state-of-the-art performance of our model, we believe that CollaboNet can improve the accuracy of downstream biomedical text mining applications such as bio-entity relation extraction.

Highlights

  • Finding biomedical named entities is one of the most essential tasks in biomedical text mining

  • We found that the JNLPBA dataset from Crichton et al [24] contained sentences that were incorrectly split

  • Since Wang et al [25] used BC5CDR-both for their experiments, we reran their models on BC5CDR-chem and BC5CDR-disease for a fair comparison with other models

Read more

Summary

Introduction

Finding biomedical named entities is one of the most essential tasks in biomedical text mining. Deep learning-based approaches have been applied to biomedical named entity recognition (BioNER) and showed promising results. There were 4.7 million full-text online accessible Named Entity Recognition (NER) is the computerized articles in PubMed Central [1] in 2017. This has led to the demand for automated extraction Biomedical named entity recognition (BioNER) is an. Republic of Korea essential building block of many downstream text mining applications such as extracting drug-drug interactions [8] and disease-treatment relations [9]. BioNER is used when building a sophisticated biomedical entity search tool [10] that enables users to pose complex queries to search for bio-entities

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call