DeepQA: improving the estimation of single protein model quality with deep belief networks.

Renzhi Cao,Jianlin Cheng,Debswapna Bhattacharya,Jie Hou

doi:10.1186/s12859-016-1405-y

Abstract

BackgroundProtein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem.ResultsWe introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods.ConclusionDeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1405-y) contains supplementary material, which is available to authorized users.

Highlights

Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction
Model accuracy evaluation metrics We evaluate the accuracy of develop a new single-model QA method (DeepQA) on 84 protein targets on both stage one and stage two models of the 11th community-wide experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP11), which are available in the Critical Assessment of Protein Structure Prediction (CASP) official website
For the deep belief network, we test the number of hidden nodes in the first and second layer of Restricted Boltzmann Machines (RBMs) from 5 to 40 respectively, learning rate Ɛ from 0.0001 to 0.01, weight cost ω from 0.001 to 0.7, and momentum ν from 0.5 to 0.9

Summary

Introduction

Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. The first is template-based modeling method [5,6,7,8,9,10,11] which uses the known structure information of homologous proteins as templates to build protein structure model, such as I-TASSER [12], FALCON [10, 11], MUFOLD [13], RaptorX [14], and MTMG [15]. The second is ab initio modeling method [16,17,18,19,20,21], which builds the structure from scratch, without using existing template structure information.

Methods

Results

Conclusion