Abstract

MotivationA large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem.ResultsWe have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations.Availability and implementationWeb server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgoSupplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • Advances in sequencing technology have led to a large and rapidly increasing amount of genetic and protein sequences, and the amount if expected to increase further through sequencing of additional organisms as well as metagenomics

  • We demonstrate that our model improves performance of function prediction over a BLAST baseline, and performs well in predicting cellular locations of proteins

  • The intention is that this model can identify both explicit dependencies between classes in Gene Ontology (GO), as expressed by relations between classes encoded in the ontology, as well as implicit dependencies such as frequently co-occurring classes

Read more

Summary

Introduction

Advances in sequencing technology have led to a large and rapidly increasing amount of genetic and protein sequences, and the amount if expected to increase further through sequencing of additional organisms as well as metagenomics. Identifying protein functions is challenging and commonly requires in vitro or in vivo experiments [12], and it is obvious that experimental functional annotation of proteins will not scale with the amount of novel protein sequences becoming available. Function prediction can use several sources of information, including protein-protein interactions [24], genetic interactions [12], evolutionary relations [14], protein structures and structure prediction methods [19], literature [28], or combinations of these [25]. These methods have been developed for many years, and their predictive performance is improving steadily [22]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call