Contact prediction in protein modeling: Scoring, folding and refinement of coarse-grained models

Dorota Latek,Andrzej Kolinski

doi:10.1186/1472-6807-8-36

Abstract

BackgroundSeveral different methods for contact prediction succeeded within the Sixth Critical Assessment of Techniques for Protein Structure Prediction (CASP6). The most relevant were non-local contact predictions for targets from the most difficult categories: fold recognition-analogy and new fold. Such contacts could provide valuable structural information in case a template structure cannot be found in the PDB.ResultsWe described comprehensive tests of the effectiveness of contact data in various aspects of de novo modeling with CABS, an algorithm which was used successfully in CASP6 by the Kolinski-Bujnicki group. We used the predicted contacts in a simple scoring function for the post-simulation ranking of protein models and as a soft bias in the folding simulations and in the fold-refinement procedure. The latter approach turned out to be the most successful. The CABS force field used in the Replica Exchange Monte Carlo simulations cooperated with the true contacts and discriminated the false ones, which resulted in an improvement of the majority of Kolinski-Bujnicki's protein models. In the modeling we tested different sets of predicted contact data submitted to the CASP6 server. According to our results, the best performing were the contacts with the accuracy balanced with the coverage, obtained either from the best two predictors only or by a consensus from as many predictors as possible.ConclusionOur tests have shown that theoretically predicted contacts can be very beneficial for protein structure prediction. Depending on the protein modeling method, a contact data set applied should be prepared with differently balanced coverage and accuracy of predicted contacts. Namely, high coverage of contact data is important for the model ranking and high accuracy for the folding simulations.

Highlights

Several different methods for contact prediction succeeded within the Sixth Critical Assessment of Techniques for Protein Structure Prediction (CASP6)
Our work focuses on the contact-based structure prediction of targets from the two categories defined in CASP6[23]: New Fold (NF) and Fold Recognition – Analogy (FR/A), for which producing a reliable template structure was extremely difficult or impossible
The Spearman rank-order correlation was used for example by Feig et al in the evaluation of CASP4 protein models obtained by different modeling methods, from comparative modeling to de-novo folding[47]

Summary

Introduction

Several different methods for contact prediction succeeded within the Sixth Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Whereas short-range information, such as the type of secondary structure, can be predicted in most cases with high accuracy (70–80%)[3] on the basis of a protein sequence, long-range contact predictions are still of rather low accuracy (at most 20%, according to the CASP6 results[4]). Such low accuracy of contact predictions, well above random (by a factor of more than 11)[5], is not (page number not for citation purposes). The main aim of this work was to establish to what extent this hypothesis is true

Objectives

Methods

Results

Conclusion