SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs

Arlo Randall,Pierre Baldi

doi:10.1186/1472-6807-8-52

Abstract

BackgroundProtein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key sub-problem. Consensus methods work well when the redundant models in the set are the most native-like, but fail when the most native-like model is unique. In contrast, structure-based methods score models independently and can be applied to model sets of any size and redundancy level. Additionally, structure-based methods have a variety of important applications including analogous fold recognition, refinement of sequence-structure alignments, and de novo prediction. The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models.ResultsHere we introduce SELECTpro, a novel structure-based model selection method derived from an energy function comprising physical, statistical, and predicted structural terms. Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, β-strand pairing, and side-chain hydrogen bonding.SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results. The average difference in GDT-TS between models ranked first by SELECTpro and the most native-like model was 5.07. This GDT-TS difference was less than 1% of the GDT-TS of the most native-like model for 18 targets, and less than 10% for 66 targets. SELECTpro also ranked the single most native-like first for 15 targets, in the top five for 39 targets, and in the top ten for 53 targets, more often than any other method. Because the ranking metric is skewed by model redundancy and ignores poor models with a better ranking than the most native-like model, the BLUNDER metric is introduced to overcome these limitations. SELECTpro is also evaluated on a recent benchmark set of 16 small proteins with large decoy sets of 12500 to 20000 models for each protein, where it outperforms the benchmarked method (I-TASSER).ConclusionSELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set. SELECTpro is available for download as a stand alone application at: . SELECTpro is also available as a public server at the same site.

Highlights

Protein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key subproblem
5.07 5.07 5.07 5.07 5.07 5.07 5.07 5.07 5.10 5.10 5.05 5.15 5.10 5.19 5.31 5.26 5.34 5.21 4.87 5.35 6.04 4.09 a The number of targets where the quality assessment (QA) group made a valid prediction (NT) with the number of domains of these targets (ND) in parentheses. * SELECTpro (699_1) results appear in bold face and all results that are better than SELECTpro are underlined
A Model Quality Assessment Programs (MQAPs) that can select the most native-like model from a set of possibilities has a variety of applications in protein structure prediction

Summary

Introduction

Protein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key subproblem. Consensus methods work well when the redundant models in the set are the most native-like, but fail when the most native-like model is unique. Structure-based methods score models independently and can be applied to model sets of any size and redundancy level. The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models. Selecting the most native-like model from a set of possible models is a crucial task in protein structure prediction. MQAP methods can be divided roughly into three categories based on the type of information they use: evolutionary methods use sequence or profile similarity between target sequence and template, consensus methods use similarity between models, and structure-based methods use model coordinates [1].

Objectives

Methods

Results

Conclusion