Abstract

BackgroundThe rapid publication of important research in the biomedical literature makes it increasingly difficult for researchers to keep current with significant work in their area of interest.ResultsThis paper reports a scalable method for the discovery of protein-protein interactions in Medline abstracts, using a combination of text analytics, statistical and graphical analysis, and a set of easily implemented rules. Applying these techniques to 12,300 abstracts, a precision of 0.61 and a recall of 0.97 were obtained, (f = 0.74) and when allowing for two-hop and three-hop relations discovered by graphical analysis, the precision was 0.74 (f = 0.83).ConclusionThis combination of linguistic and statistical approaches appears to provide the highest precision and recall thus far reported in detecting protein-protein relations using text analytic approaches.

Highlights

  • The rapid publication of important research in the biomedical literature makes it increasingly difficult for researchers to keep current with significant work in their area of interest

  • There are a number of tabulations of these interactions, such as that provided by the Munich Institute for Protein Sequence (MIPS); these tabulations are of necessity incomplete

  • We have been developing a group of biology-specific computational annotators that work in conjunction with our group's text analytic software, for the discovery of protein-protein relations in text

Read more

Summary

Introduction

The rapid publication of important research in the biomedical literature makes it increasingly difficult for researchers to keep current with significant work in their area of interest. While the actual experimental study of such interactions remains the most important manner of obtaining these data, the number of protein-protein interactions reported in the literature is substantial and growing rapidly. There are a number of tabulations of these interactions, such as that provided by the Munich Institute for Protein Sequence (MIPS); these tabulations are of necessity incomplete. To address this problem, we have been developing a group of biology-specific computational annotators that work in conjunction with our group's text analytic software, for the discovery of protein-protein relations in text. We undertook a study that utilizes a combination of computational linguistics, statistics and domain-specific rules to detect protein-protein interactions in a set of Medline abstracts. Having a scalable, robust system for protein interaction discovery provides a major information tool for molecular biologists

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call