Abstract

BackgroundAccurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks.ResultsOn 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements.ConclusionsExploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/

Highlights

  • Predicted contacts allow to compute the 3D structure of a protein

  • We develop a new representation with drastically reduced dimensionality that translates into a deep neural network predictor with improved performance. We show that this approach reaches 35.7% accuracy for the top L/10 long-range contacts on 21 CASP11 free modeling target domains, 11% better than the CASP11 winning version of MetaPSICOV, where L is the length of the protein

  • We compare our method to the CASP11 version of MetaPSICOV [23], which outperformed all other methods in CASP11

Read more

Summary

Introduction

Predicted contacts allow to compute the 3D structure of a protein. Recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The different information sources are combined using deep neural networks. Contact prediction methods identify residue pairs that are in spatial proximity in the native structure of a protein. Contacts can be used as constraints to guide ab initio methods [1,2,3,4,5] and to reconstruct the 3D structure of a protein [6,7,8,9,10,11]. Many different sources of information are used in contact prediction.

Objectives
Methods
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call