Abstract

Biomedical entities recognition such as gene, protein, chemicals and diseases is the first and most fundamental biomedical literature mining task. Most of recent biomedical named entity recognition (Bio-NER) methods rely on predefined features which try to capture the specific surface properties of entity types. However, these empirically predefined feature sets differ between entity types and they are complex manually constructed which make their development costly. This paper presents a comparative evaluation of traditional feature representation method and new prototypical representation methods with three machine learning classifiers (Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN)) for Bio-NER. Several comparative experiments are conducted on widely used standard Bio-NER dataset namely GENIA corpus. This paper demonstrates that prototypical word representation methods can be successfully used for Bio-NER. Experimental results show that the prototypical representation methods improved the performance of the three machine learning models. Finally, the experiments indicate that the SVM classifier with prototypical representation methods yields the best result.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call