MetaMQAP: A meta-server for the quality assessment of protein models

Marcin Pawlowski,Michal J Gajda,Ryszard Matlak,Janusz M Bujnicki

doi:10.1186/1471-2105-9-403

Abstract

BackgroundComputational models of protein structure are usually inaccurate and exhibit significant deviations from the true structure. The utility of models depends on the degree of these deviations. A number of predictive methods have been developed to discriminate between the globally incorrect and approximately correct models. However, only a few methods predict correctness of different parts of computational models. Several Model Quality Assessment Programs (MQAPs) have been developed to detect local inaccuracies in unrefined crystallographic models, but it is not known if they are useful for computational models, which usually exhibit different and much more severe errors.ResultsThe ability to identify local errors in models was tested for eight MQAPs: VERIFY3D, PROSA, BALA, ANOLEA, PROVE, TUNE, REFINER, PROQRES on 8251 models from the CASP-5 and CASP-6 experiments, by calculating the Spearman's rank correlation coefficients between per-residue scores of these methods and local deviations between C-alpha atoms in the models vs. experimental structures. As a reference, we calculated the value of correlation between the local deviations and trivial features that can be calculated for each residue directly from the models, i.e. solvent accessibility, depth in the structure, and the number of local and non-local neighbours. We found that absolute correlations of scores returned by the MQAPs and local deviations were poor for all methods. In addition, scores of PROQRES and several other MQAPs strongly correlate with 'trivial' features. Therefore, we developed MetaMQAP, a meta-predictor based on a multivariate regression model, which uses scores of the above-mentioned methods, but in which trivial parameters are controlled. MetaMQAP predicts the absolute deviation (in Ångströms) of individual C-alpha atoms between the model and the unknown true structure as well as global deviations (expressed as root mean square deviation and GDT_TS scores). Local model accuracy predicted by MetaMQAP shows an impressive correlation coefficient of 0.7 with true deviations from native structures, a significant improvement over all constituent primary MQAP scores. The global MetaMQAP score is correlated with model GDT_TS on the level of 0.89.ConclusionFinally, we compared our method with the MQAPs that scored best in the 7th edition of CASP, using CASP7 server models (not included in the MetaMQAP training set) as the test data. In our benchmark, MetaMQAP is outperformed only by PCONS6 and method QA_556 – methods that require comparison of multiple alternative models and score each of them depending on its similarity to other models. MetaMQAP is however the best among methods capable of evaluating just single models.We implemented the MetaMQAP as a web server available for free use by all academic users at the URL

Highlights

Computational models of protein structure are usually inaccurate and exhibit significant deviations from the true structure
Most Model Quality Assessment Programs (MQAPs) methods were optimized for the structures of crystallographic quality, and all 'non-physical' details contribute to their scores in unpredictable ways – either as very serious errors or as artificially positive elements
Our CASP7 results clearly demonstrate that utilization of 'crude' CASP models leads to decreased performance of MQAPs, compared to the 'idealized' variants of the same models

Summary

Introduction

Computational models of protein structure are usually inaccurate and exhibit significant deviations from the true structure. A number of predictive methods have been developed to discriminate between the globally incorrect and approximately correct models. Several Model Quality Assessment Programs (MQAPs) have been developed to detect local inaccuracies in unrefined crystallographic models, but it is not known if they are useful for computational models, which usually exhibit different and much more severe errors. The existing methods for quality assessment of protein models (MQAPs) are usually based either on a physical effective energy which can be obtained from fundamental analysis of particle forces or on an empirical pseudo energy derived from known protein structures Comparative models, especially those based on remotely related templates, often exhibit local inaccuracies that are difficult to identify by a global evaluation, in particular misthreadings of short regions (5–10 residues) corresponding to shifted alignments within individual secondary structure elements [7,8]

Objectives

Methods

Results

Conclusion