Abstract

The amino acid sequence of a protein encodes the blueprint of its native structure. To predict the corresponding structural fold from the protein’s sequence is one of most challenging problems in computational biology. In this work, we introduce DESTINI (deep structural inference for proteins), a novel computational approach that combines a deep-learning algorithm for protein residue/residue contact prediction with template-based structural modelling. For the first time, the significantly improved predictive ability is demonstrated in the large-scale tertiary structure prediction of over 1,200 single-domain proteins. DESTINI successfully predicts the tertiary structure of four times the number of “hard” targets (those with poor quality templates) that were previously intractable, viz, a “glass-ceiling” for previous template-based approaches, and also improves model quality for “easy” targets (those with good quality templates). The significantly better performance by DESTINI is largely due to the incorporation of better contact prediction into template modelling. To understand why deep-learning accomplishes more accurate contact prediction, systematic clustering reveals that deep-learning predicts coherent, native-like contact patterns compared to co-evolutionary analysis. Taken together, this work presents a promising strategy towards solving the protein structure prediction problem.

Highlights

  • The protein folding problem asks the following question: given an amino acid sequence of a protein, can one predict its corresponding 3D structure? Since this question was raised in the 1960s1, a tremendous amount of research efforts have been invested towards its solution

  • An early study estimates that, for single-domain proteins less than 200 amino acids (AAs), one can assemble a structural model within a 5 Å root-mean-square-deviation (RMSD) from the native structure if more than L/4 long-range protein contacts are known[8], where L is the length of the protein

  • The contact predictions are supplied to structural modeling, the second component of DESTINI, which is a further development based on the TASSERVMT approach[5]

Read more

Summary

Introduction

The protein folding problem asks the following question: given an amino acid sequence of a protein, can one predict its corresponding 3D structure? Since this question was raised in the 1960s1, a tremendous amount of research efforts have been invested towards its solution. Since the contact map requires dense pixel-level labelling, not a simple classification of the whole image as in image recognition, one needs to apply a segment-based classification algorithm using fully convoluted neural networks (FCNs)[32]. Very recently, such an idea has been introduced by a couple of groups that employed somewhat different designs, but all used fully convolutional networks for predicting the protein contact map[33,34]. As shown in the most recent 12th Critical Assessment of Structure Prediction (CASP) competition[35], these new methods achieved significant improvement over previous methods based on co-evolutionary analysis or “shallow” machine-learning techniques

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call