Abstract
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.
Highlights
Protein contact prediction identifies potential residue pairs in spatial proximity in the native protein—without knowledge of the native structure itself.Accurate contact prediction is of great interest and value, as even partial knowledge of residue-residue contacts for a target protein enables the computation of that protein’s native structure [1,2]
We introduce a novel contact prediction method, EPC-map, that predicts contacts using two sources of information: evolutionary information from multiple sequence alignments and information from physicochemical energy potentials (EPC-map stands for using Evolutionary and Physicochemical information to predict Contact maps)
We presented EPC-map, a contact prediction method that achieves unprecedented prediction accuracy by combining evolutionary information from multiple-sequence alignments with physicochemical information from structure prediction methods
Summary
Accurate contact prediction is of great interest and value, as even partial knowledge of residue-residue contacts for a target protein enables the computation of that protein’s native structure [1,2]. Information about native contacts can be used to guide conformational space search in ab initio protein structure prediction [3,4]. There are five broad categories of contact prediction methods: contact prediction from evolutionary information, from sequencebased machine-learning algorithms, from template structures, from structure prediction decoys and by integrating sequence and structural restraints. They differ in the type of information they use to make predictions
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have