DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks

Ahmet Sureyya Rifaioglu,Tunca Doğan,Maria Jesus Martin,Rengul Cetin-Atalay,Volkan Atalay

doi:10.1038/s41598-019-43708-3

Abstract

Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the ‘biofilm formation process’ in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred.

Highlights

Www.nature.com/scientificreports is an initiative, whose aim is the large-scale evaluation of protein function prediction methods, and the results of the first two CAFA challenges showed that protein function prediction is still a challenging area[9,10]
In terms of the model architecture and properties, Deep Neural Network (DNN) are classified into multiple groups, the most popular architectures are feed-forward DNN, recurrent neural network (RNN), restricted Boltzmann machine (RBM) and deep belief network (DBN), auto encoder deep neural networks, convolutional neural network (CNN), and graph convolutional network (GCN)[14,15,18,19,22,23]
We identified 8 genes in the P. aureginosa reference genome that are associated with biofilm formation, but not annotated with the corresponding Gene Ontology (GO) term or its functionally related neighboring terms, in the source databases at the time of this analysis

Summary

Introduction

Www.nature.com/scientificreports is an initiative, whose aim is the large-scale evaluation of protein function prediction methods, and the results of the first two CAFA challenges showed that protein function prediction is still a challenging area[9,10]. One of the most critical obstacles against developing a practical DNN-based predictive tool is the computationally intensive training processes that limits the size of input data and the number of functional categories that can be included in the system. Due to this reason, previous studies mostly focused on a small number of protein families or GO terms. There is a need for new predictive approaches with high performance, and with real-world usability, to be able to support in vitro studies in protein function identification

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific reports	Publication Date: May 14, 2019
Citations: 94	License type: open-access

R Discovery Prime

R Discovery Prime

DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports

Lead the way for us

Similar Papers

Predicting human protein function with multi-task deep neural networks.
Rui Fa ... Domenico Cozzetto
PLOS ONE | VOL. 13
Rui Fa, et. al.Rui Fa ... Domenico Cozzetto
11 Jun 2018
PLOS ONE | VOL. 13

NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.
Shuwei Yao ... Ronghui You
Nucleic Acids Research | VOL. 49
Shuwei Yao, et. al.Shuwei Yao ... Ronghui You
26 May 2021
Nucleic Acids Research | VOL. 49

HMMeta
Sola Gbenro ... Kyle Hippe
-
Sola Gbenro, et. al.Sola Gbenro ... Kyle Hippe
21 Sep 2020
21 Sep 2020

Evaluating the impact of topological protein features on the negative examples selection
Paolo Boldi ... Marco Frasca
BMC Bioinformatics | VOL. 19
Paolo Boldi, et. al.Paolo Boldi ... Marco Frasca
01 Nov 2018
BMC Bioinformatics | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports