Abstract

Biomedical Named Entity Recognition (BNER) is identification of entities such as drugs, genes, and chemicals from biomedical text, which help in information extraction from the domain literature. It would allow extracting information such as drug profiles, similar or related drugs and associations between drugs and their targets. This venue presents opportunities for improvement even though many machine learning methods have been applied. The efficiency can be improved in case of biological related chemical entities as there are varied structure and properties. This new approach combines two state-of-the-art algorithms and aims to improve the performance by applying it to varied sets of features including linguistic, orthographic, Morphological, domain features and local context features. It uses the sequence tagging capability of CRF to identify the boundary of the entity and classification efficiency of SVM to detect subtypes in BNER. The method is tested on two different datasets 1) GENIA and 2) CHEMDNER corpus with different types of entities. The result shows that proposed hybrid method enhances the BNER compared to the conventional machine learning algorithms. Moreover the detailed study of SVM and the methodologies has been discussed clearly. The linear and non linear text classification can be mapped clearly in the section 3. The final section describes the results and the evaluation of the proposed method.

Highlights

  • Named Entity Recognition (NER) refers to identifying and classifying terms belonging to a domain from unstructured text and mapping them to predefined categories

  • Conditional Random field is the most used algorithm for named entity recognition since it combines the capability of discriminative classification and graphical modelling in to one

  • Support Vector Machine is a state of the art classifier which can perform both linear and non-linear classification

Read more

Summary

Introduction

Named Entity Recognition (NER) refers to identifying and classifying terms belonging to a domain from unstructured text and mapping them to predefined categories. Dictionary based approach is useful if the vocabulary is complete and updated and requires certain pre-processing such as normalization for matching the text with the vocabulary. Both the approaches does not extracts unseen entities, that is if there is variation in patterns or the term is not present in the vocabulary they are not extracted. Machine learning based approach solves the problem by learning the distinctive features to identify an entity. Supervised machine learning algorithms are used for extracting named entities and it requires large annotated data to learn the features of entities [2]. Analysing characteristics of microbiological characteristics was studied by [3] and comparative analysis was carried out between chemical and microbiological character for analysing antibacterial activities [4]

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call