Abstract

Motivation: A complete repository of gene–gene interactions is key for understanding cellular processes, human disease and drug response. These gene–gene interactions include both protein–protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene–gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein–protein and transcription factor interactions from over 100 000 full-text PLOS articles.Methods: We built an extractor for gene–gene interactions that identified candidate gene–gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions.Results: Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100 000 full-text articles.Availability and implementation: Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_appContact: russ.altman@stanford.eduSupplementary information: Supplementary data are available at Bioinformatics online.

Highlights

  • A complete repository of the gene–gene interactions is a key for understanding cellular processes, human disease and drug response

  • We developed a gene–gene relation extractor for the system DeepDive and applied it to the entirety of three PLOS journals

  • We extracted direct physical protein– protein interactions (PPIs), indirect interactions and transcription factor interactions (TFIs) to create a complete view of known protein and gene interactions

Read more

Summary

Introduction

A complete repository of the gene–gene interactions is a key for understanding cellular processes, human disease and drug response. These interactions inform gene network analyses that typically rely on curated interaction databases. Two types of interactions are critical for understanding how a protein or gene affects biological or disease processes: physical protein– protein interactions (PPIs) and transcription factor interactions (TFIs). PPIs include interactions where two proteins physically bind to one another to form a complex or otherwise modify the function of one or both proteins. TFIs involve transcription factors directly binding upstream of a gene to control transcription of that gene. Modifications to PPIs and/or TFIs can have a detrimental effect on their associated cellular processes.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call