Protein named entity classification with probabilistic features derived from GENIA corpus and MEDLINE

Sagara Sumathipala,Koichi Yamada,Muneyuki Unehara

doi:10.1109/scis-isis.2014.7044640

Abstract

Biomedical named entity recognition (BNER) is one of the most essential and initial tasks (discovering relations between biomedical entities, identifying molecular pathways, etc.) of biomedical information retrieval. Although named entity recognition performed well in ordinary text, it still remains challenging in molecular biology domain because of the complex nature of biomedical nomenclature, different kinds of spelling forms and many more reasons. Even though biomedical entities in biological text are found successfully, classifying them into relevant biomedical classes such as genes, proteins, diseases, drug names, etc. is still another challenge and an open question. This paper presents a new method to classify biomedical named entities into protein and non-protein classes. Our approach employs Random Forest, a machine learning algorithm, with a new combination of features. They are orthographic, keyword and morphological, as well as a probabilistic feature called Proteinhood and a Protein-Score feature based on the Medline abstracts cited on the Pubmed, which are the main contributions in the paper. A series of experiments is conducted to compare the proposed approach with other state of the art approaches. Our protein named entity classifier shows significant performance in the experiments on GENIA corpus achieving the highest values of precision 93.8%, recall 83.8% and F-measure 88.5% for protein named entity identification. In this study we showed the effect of new Proteinhood and Protein-Score features as well as adjusting parameters of Random Forest algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Protein named entity classification with probabilistic features derived from GENIA corpus and MEDLINE

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Biomedical Named Entity Recognition: A Poor Knowledge HMM-Based Approach
Natalia Ponomareva ... Paolo Rosso
-
Natalia Ponomareva, et. al.Natalia Ponomareva ... Paolo Rosso
27 Jun 2007
27 Jun 2007

Extended Distributed Prototypical For Biomedical Named Entity Recognition
Maan Tareq Abd ... Masnizah Mohd
Asia-Pacific Journal of Information Technology & Multimedia | VOL. 06
Maan Tareq Abd, et. al.Maan Tareq Abd ... Masnizah Mohd
30 Nov 2017
Asia-Pacific Journal of Information Technology & Multimedia | VOL. 06

Towards Bootstrapping Biomedical Named Entity Recognition using Reinforcement Learning
Dongsheng Wang ... Hongjie Fan
-
Dongsheng Wang, et. al.Dongsheng Wang ... Hongjie Fan
16 Dec 2020
16 Dec 2020

BioBBC: a multi-feature model that enhances the detection of biomedical entities
Hind Alamro ... Xin Gao
Scientific Reports | VOL. 14
Hind Alamro, et. al.Hind Alamro ... Xin Gao
02 Apr 2024
Scientific Reports | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Protein named entity classification with probabilistic features derived from GENIA corpus and MEDLINE

Abstract

Talk to us

Similar Papers