A comparative study of segment representation for biomedical named entity recognition

H L Shashirekha,Hamada A Nayel

doi:10.1109/icacci.2016.7732182

Abstract

Biomedical Named Entity Recognition (Bio-NER) is an important subtask of Biomedical Text Mining (BioTM), where the performance of further tasks, such as relation extraction, protein-protein interaction and hypothesis generation depend on the performance of Bio-NER. Bio-NER involves determining the biomedical named entities, such as DNA, RNA, cell types, gene and protein present in the biomedical research articles. Annotating the dataset for training the classifier to recognize and classify named entities is the crucial task in BioNER. Segment representation (SR) is an efficient way of annotating Biomedical Named Entities (BioNEs) within a sentence to differentiate them from non-BioNEs. In this paper, we have used Support Vector Machines (SVMs) and Conditional Random fields (CRFs) to train different BioNER models with the benchmark JNLPBA 2004 and i2b2 2010 shared task dataset using different SRs. The performance of SR models shows that more complex the model worse performance of f-score.

Full Text