Overview of the protein-protein interaction annotation extraction task of BioCreative II

Martin Krallinger,Florian Leitner,Alfonso Valencia,Carlos Rodriguez-Penagos

doi:10.1186/gb-2008-9-s2-s4

Martin Krallinger, Florian Leitner + Show 2 more

Open Access

https://doi.org/10.1186/gb-2008-9-s2-s4

Copy DOI

Journal: Genome Biology	Publication Date: Jan 1, 2008
Citations: 283	License type: cc-by

Affiliation: Spanish National Cancer Research Centre

Abstract

Background:The biomedical literature is the primary information source for manual protein-protein interaction annotations. Text-mining systems have been implemented to extract binary protein interactions from articles, but a comprehensive comparison between the different techniques as well as with manual curation was missing.Results:We designed a community challenge, the BioCreative II protein-protein interaction (PPI) task, based on the main steps of a manual protein interaction annotation workflow. It was structured into four distinct subtasks related to: (a) detection of protein interaction-relevant articles; (b) extraction and normalization of protein interaction pairs; (c) retrieval of the interaction detection methods used; and (d) retrieval of actual text passages that provide evidence for protein interactions. A total of 26 teams submitted runs for at least one of the proposed subtasks. In the interaction article detection subtask, the top scoring team reached an F-score of 0.78. In the interaction pair extraction and mapping to SwissProt, a precision of 0.37 (with recall of 0.33) was obtained. For associating articles with an experimental interaction detection method, an F-score of 0.65 was achieved. As for the retrieval of the PPI passages best summarizing a given protein interaction in full-text articles, 19% of the submissions returned by one of the runs corresponded to curator-selected sentences. Curators extracted only the passages that best summarized a given interaction, implying that many of the automatically extracted ones could contain interaction information but did not correspond to the most informative sentences.Conclusion:The BioCreative II PPI task is the first attempt to compare the performance of text-mining tools specific for each of the basic steps of the PPI extraction pipeline. The challenges identified range from problems in full-text format conversion of articles to difficulties in detecting interactor protein pairs and then linking them to their database records. Some limitations were also encountered when using a single (and possibly incomplete) reference database for protein normalization or when limiting search for interactor proteins to co-occurrence within a single sentence, when a mention might span neighboring sentences. Finally, distinguishing between novel, experimentally verified interactions (annotation relevant) and previously known interactions adds additional complexity to these tasks.

Highlights

The biomedical literature is the primary information source for manual protein-protein interaction annotations
Some interaction databases such as the human protein-protein interaction database HPRD (Human Protein Reference Database) [1], HomoMINT [2], and MIPS (Munich Information Center for Protein Sequences) [3] focus on certain taxa and store mainly information for human or mammalian proteins
Interaction articles subtask The aim of the Interaction article subtask (IAS) was to determine whether text-mining tools can detect and rank interaction annotation-relevant articles based on PubMed titles and abstracts only

Summary

Introduction

The biomedical literature is the primary information source for manual protein-protein interaction annotations. To capture and provide efficient access to the underlying information, structured interaction annotations have been stored in public databases. These databases vary in annotation depth and type of interactions, but a common characteristic is that the annotations are primarily extracted by human curators from relevant publications. The interaction databases MINT (Molecular Interactions Database) [6] and IntAct [7] contain the largest number of nonredundant direct human protein-protein interactions, exceeded in number only by HPRD They provide literature references relevant to the individual interactions, together with the experimental interaction detection method used as supporting evidence [8]

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Overview of the protein-protein interaction annotation extraction task of BioCreative II

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

The PPI affix dictionary (PPIAD) and BioMethod Lexicon: importance of affixes and tags for recognition of entity mentions and experimental protein interactions
Martin Krallinger ... Florian Leitner
BMC Bioinformatics | VOL. 11
Martin Krallinger, et. al.Martin Krallinger ... Florian Leitner
01 Oct 2010
BMC Bioinformatics | VOL. 11

Proteome-wide Prediction of Signal Flow Direction in Protein Interaction Networks Based on Interacting Domains
Wei Liu ... Fuchu He
Molecular & Cellular Proteomics | VOL. 8
Wei Liu, et. al.Wei Liu ... Fuchu He
01 Sep 2009
Molecular & Cellular Proteomics | VOL. 8

The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.
Martin Krallinger ...
BMC Bioinformatics | VOL. Suppl 12 8
Martin Krallinger, et. al.Martin Krallinger ...
03 Oct 2011
BMC Bioinformatics | VOL. Suppl 12 8

Genetically Encoded Residue-Selective Photo-Crosslinker to Capture Protein-Protein Interactions in Living Cells
Wei Hu ... Xiao-Hua Chen
Chem | VOL. 5
Wei Hu, et. al.Wei Hu ... Xiao-Hua Chen
23 Sep 2019
Chem | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Overview of the protein-protein interaction annotation extraction task of BioCreative II

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology