Abstract

Rapid, accurate prediction of protein structure from amino acid sequence would accelerate fields as diverse as drug discovery, synthetic biology and disease diagnosis. Massively improved prediction of protein structures has been driven by improving the prediction of the amino acid residues that contact in their 3D structure. For an average globular protein, around 92% of all residue pairs are non-contacting, therefore accurate prediction of only a small percentage of inter-amino acid distances could increase the number of constraints to guide structure determination. We have trained deep neural networks to predict inter-residue contacts and distances. Distances are predicted with an accuracy better than most contact prediction techniques. Addition of distance constraints improved de novo structure predictions for test sets of 158 protein structures, as compared to using the best contact prediction methods alone. Importantly, usage of distance predictions allows the selection of better models from the structure pool without a need for an external model assessment tool. The results also indicate how the accuracy of distance prediction methods might be improved further.

Highlights

  • The problem of predicting protein structure from amino acid sequence has been transformed in the last decade from one of aspiration to one of application, prediction methods are not yet a routine laboratory tool

  • For the test set with 108 proteins, distance prediction accuracies are better than the contact prediction accuracies of many other methods

  • For the test set with 50 proteins, distance prediction accuracies are better than the contact prediction accuracies of MetaPSICOV, but not as high as the RaptorX convolutional neural network (S4 Fig)

Read more

Summary

Introduction

The problem of predicting protein structure from amino acid sequence has been transformed in the last decade from one of aspiration to one of application, prediction methods are not yet a routine laboratory tool. The authors benchmarked the time for predicting the structure of a 200 amino acid protein as 13 000 CPU core hours, which amounts to around 5 days of processing on 100 cores in a supercomputing cluster, or around 50 to 100 days on a typical desktop machine. This limitation makes structure prediction inaccessible for non-specialists and prevents broader exploitation, e.g. for high-throughput protein structure prediction.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call