MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials

Tong Liu,Zheng Wang

doi:10.1186/s12859-020-3383-3

Abstract

BackgroundProtein model quality assessment (QA) is an essential procedure in protein structure prediction. QA methods can predict the qualities of protein models and identify good models from decoys. Clustering-based methods need a certain number of models as input. However, if a pool of models are not available, methods that only need a single model as input are indispensable.ResultsWe developed MASS, a QA method to predict the global qualities of individual protein models using random forests and various novel energy functions. We designed six novel energy functions or statistical potentials that can capture the structural characteristics of a protein model, which can also be used in other protein-related bioinformatics research. MASS potentials demonstrated higher importance than the energy functions of RWplus, GOAP, DFIRE and Rosetta when the scores they generated are used as machine learning features. MASS outperforms almost all of the four CASP11 top-performing single-model methods for global quality assessment in terms of all of the four evaluation criteria officially used by CASP, which measure the abilities to assign relative and absolute scores, identify the best model from decoys, and distinguish between good and bad models. MASS has also achieved comparable performances with the leading QA methods in CASP12 and CASP13.ConclusionsMASS and the source code for all MASS potentials are publicly available at http://dna.cs.miami.edu/MASS/.

Highlights

Protein model quality assessment (QA) is an essential procedure in protein structure prediction
We evaluated MASS along with other QA methods in CASP11, CASP12, and CASP13 and found that MASS outperforms most of the methods in CASP11 and is comparable with the leading methods in CASP12 and CASP13
Similar to how critical assessment of techniques for protein structure (CASP) officially evaluates QA methods that predict global qualities [1] of protein models, we assessed our method, together with four methods participated in CASP11, seven in CASP12, and 16 in CASP13, by four criteria measuring the abilities to assign relative scores, identify the best model from decoys, assign absolute scores, and discriminate good models from bad models

Summary

Introduction

Protein model quality assessment (QA) is an essential procedure in protein structure prediction. Clustering-based methods need a certain number of models as input. If a pool of models are not available, methods that only need a single model as input are indispensable. The quality assessment (QA) of protein models plays an important role in protein tertiary structure prediction and model refinement [1]. Compared with clustering-based methods that require a pool of protein models as input, single-model methods only need an individual protein model as input [8]. Single-model methods have used various features for training the machine learning models, such as energy functions [7, 11] and the consistency between predicted and assigned secondary structures [8]. Liu et al [8] developed a deep learning architecture based on stacked denoising encoders (SdA) to predict residue-specific qualities of individual models. Cao et al developed DeepQA [5], in which energy functions, physio-chemical characteristics, and

Methods

Results

Conclusion