Abstract

Named Entity Recognition and Classification (NERC) is one of the most fundamental and important tasks in biomedical informa–tion extraction. Biomedical named entities (NEs) include mentions of proteins, genes, DNA, RNA etc. which, in general, have complex structures and are difficult to recognize. We have developed a large number of features for identifying NEs from biomed–ical texts. Two robust diverse classification methods like Conditional Random Field (CRF) and Support Vector Machine (SVM) are used to build a number of models depending upon the various representations of the set of features and/or feature templates. Finally the outputs of these different classifiers are combined using multiobjective weighted voted approach. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. Thus, in an ensemble system, it is neces–sary to determine the appropriate weight of vote for each output class in each classifier. Here, a multiobjective genetic algorithm is utilized for determining appropriate weights of votes for combining the outputs of classifiers. The developed technique is evaluated with the benchmark dataset of JNLPBA 2004 that yields the overall recall, precision and F-measure values of 74.10%, 77.58% and 75.80%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call