Abstract

Protein phosphorylation is central to the regulation of most aspects of cell function. Given its importance, it has been the subject of active research as well as the focus of curation in several biological databases. We have developed Rule-based Literature Mining System for protein Phosphorylation (RLIMS-P), an online text-mining tool to help curators identify biomedical research articles relevant to protein phosphorylation. The tool presents information on protein kinases, substrates and phosphorylation sites automatically extracted from the biomedical literature. The utility of the RLIMS-P Web site has been evaluated by curators from Phospho.ELM, PhosphoGRID/BioGrid and Protein Ontology as part of the BioCreative IV user interactive task (IAT). The system achieved F-scores of 0.76, 0.88 and 0.92 for the extraction of kinase, substrate and phosphorylation sites, respectively, and a precision of 0.88 in the retrieval of relevant phosphorylation literature. The system also received highly favorable feedback from the curators in a user survey. Based on the curators’ suggestions, the Web site has been enhanced to improve its usability. In the RLIMS-P Web site, phosphorylation information can be retrieved by PubMed IDs or keywords, with an option for selecting targeted species. The result page displays a sortable table with phosphorylation information. The text evidence page displays the abstract with color-coded entity mentions and includes links to UniProtKB entries via normalization, i.e. the linking of entity mentions to database identifiers, facilitated by the GenNorm tool and by the links to the bibliography in UniProt. Log in and editing capabilities are offered to any user interested in contributing to the validation of RLIMS-P results. Retrieved phosphorylation information can also be downloaded in CSV format and the text evidence in the BioC format. RLIMS-P is freely available.Database URL: http://www.proteininformationresource.org/rlimsp/

Highlights

  • The reversible phosphorylation of proteins is central to the regulation of most aspects of cell function

  • Rule-based Literature Mining System for protein Phosphorylation (RLIMS-P) version 2.0 The RLIMS-P 2.0 system consists of several customized modules for biomedical text processing, including (i) a shallow parser based on part-of-speech tags and handcrafted rules for syntactically analyzing input text, e.g. detecting noun phrases and verb group phrases, (ii) a term classifier that annotates noun phrases with predefined semantic categories, such as protein, protein part and chemical, using rules defined over the headwords of the phrases, their affixes and the words surrounding these phrases, (iii) a pattern-based information extraction (IE) engine that extracts phrases referring to target entities and (iv) an additional IE component that identifies a phosphorylation event reported across multiple clauses and sentences [see [10, 16] for details]

  • These sets were suitable for the performance evaluation because the correctness of the automated extraction can be directly judged by the edit status of the annotated entities, i.e. an entity extracted by RLIMS-P was either validated or rejected by the curator, and an entity missed by the system was manually added during the curation task

Read more

Summary

Introduction

The reversible phosphorylation of proteins is central to the regulation of most aspects of cell function. To support the efficient identification and review of phosphorylationrelated literature, we have developed a rule-based information extraction (IE) system, named RLIMS-P, a Rule-based Literature Mining System for protein Phosphorylation [9, 10]. RLIMS-P is an online text-mining tool that provides an interface to identify articles relevant to protein phosphorylation, and presents information on protein kinases, substrates and phosphorylation sites extracted from the biomedical literature. RLIMS-P has been used to support curation of phosphorylation information and the construction of protein phosphorylation networks [3, 11,12,13] It has been integrated as a system module to provide phosphorylation information necessary for another textmining system, eFIP, which extracts the functional impact of phosphorylation events [14, 15]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call