Abstract

Biomedical entity recognition (such as genes, proteins, chemicals, diseases, etc.) is the foundation of biomedical text mining, which plays a significant role in extracting biomedical entity relations and constructing biomedical knowledge bases. To deal with existing issues of the current disease name recognition systems, this paper proposes a series of new syntactic and semantic features to improve disease name recognition. The syntactic features include chunk and dependency information, while the semantic features include the disease abbreviation form, its dictionary entry form, and hyponymy relationships between disease concepts. Experiments over the NCBI disease corpus show the CRF model, combined with these syntactic and semantic features, can significantly improve the state-of-the-art performance of disease entity recognition, achieving an F1 score of 85.3%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call