Abstract

The prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures. We propose that a powerful concept from deep learning, called ensembling, can increase the accuracy of protein contact predictions by combining the outputs of different neural network models. We show that ensembling the predictions made by different groups at the recent Critical Assessment of Protein Structure Prediction (CASP13) outperforms all individual groups. Further, we show that contacts derived from the distance predictions of three additional deep neural networks—AlphaFold, trRosetta, and ProSPr—can be substantially improved by ensembling all three networks. We also show that ensembling these recent deep neural networks with the best CASP13 group creates a superior contact prediction tool. Finally, we demonstrate that two ensembled networks can successfully differentiate between the folds of two highly homologous sequences. In order to build further on these findings, we propose the creation of a better protein contact benchmark set and additional open-source contact prediction methods.

Highlights

  • The prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures

  • The biannual Critical Assessment of Protein Structure Prediction (CASP) defines two nonadjacent amino acids to be in contact if the Cβ distance is less than 8 Å in the folded structure

  • It is possible to convert distances into contact predictions, so we compared the performance of three new deep learning methods (­ AlphaFold15, ­trRosetta[16], and P­ roSPr17) that did not contribute to the CASP13 contact assessment with participating groups, both individually and as an ensemble

Read more

Summary

Introduction

The prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures. The prediction of protein structure from primary sequence is a long-standing challenge that has recently seen major advancements through two-stage folding p­ ipelines[3]. Such two-stage methods first predict one- and two-dimensional protein structure ­annotations[4] (PSAs), such as amino acid contact or distance probabilities, using machine learning methods. It is possible to convert distances into contact predictions (see Methods), so we compared the performance of three new deep learning methods (­ AlphaFold15, ­trRosetta[16], and P­ roSPr17) that did not contribute to the CASP13 contact assessment with participating groups, both individually and as an ensemble

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call