Abstract

Background:Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches.Results:During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F1 score of28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F1 score = 30.40%) and on the entire dataset (30.96%, 29.35%, and26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system.Conclusion:We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions.

Highlights

  • Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes

  • The task is difficult because the relevance of some articles cannot be determined through reading their abstracts alone, and curators usually must obtain evidence from the full text

  • The top 50 features, whose significance was measured using the χ2 test, were selected from the remaining training dataset. Based on these 50 features, three probability distributions were estimated from the leave-out dataset by using Equation 3, from the remaining training dataset, and from the official test dataset

Read more

Summary

Introduction

Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. More and more interaction data are being published in the literature as a result of the development of high-throughput experimental technologies, such as the yeast two-hybrid screening and affinity purification coupled with mass spectroscopy. These experimental techniques make it possible to study protein interactions on a much larger scale, they suffer at times from poor resolution. To provide reliable protein interaction data for biologists, interaction databases such as Molecular Interactions Database (MINT) [3] and IntAct [4] manually detect and curate protein interactions from different information sources It is becoming difficult for database curators to keep up with the rapidly expanding literature and the increasing number of newly discovered proteins

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.