MQAPRank: improved global protein model quality assessment by learning-to-rank

Xiaoyang Jing,Qiwen Dong

doi:10.1186/s12859-017-1691-z

Abstract

BackgroundProtein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem.ResultsHere, we present the MQAPRank, a global protein model quality assessment program based on learning-to-rank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. And then it takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. Recently, the MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances.ConclusionsThe MQAPRank provides a convenient and powerful tool for protein model quality assessment with the state-of-the-art performances, it is useful for protein structure prediction and model quality assessment usages.

Highlights

ResultsWe present the MQAPRank, a global protein model quality assessment program based on learning-torank
Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted
ModFOLD6_cor quasi-single aBest 150: the dataset comprised of the best 150 models submitted on a target according to the benchmark consensus method. bSelect 20: the dataset comprised of 20 models spanning the whole range of server model difficulty on each target. cDiff: The average difference between the predicted and GDT_TS scores. dMCC: Matthews correlation coefficient. eAUC: The area under the ROC curve. fLoss: The loss in quality between the best available model and the predicted best model

Summary

Results

We present the MQAPRank, a global protein model quality assessment program based on learning-torank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. It takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. The MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances

Conclusions

Background

Method Type

Results and discussion