Abstract

Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. On the other hand, the number of solved protein structures is rapidly increasing while the number of unique protein folds has reached a steady number, suggesting more structural information is being accumulated on each fold. Deep learning neural network is a powerful method to learn such big data set and has shown superior performance in many machine learning fields. In this study, we applied the deep learning neural network approach to computational protein design for predicting the probability of 20 natural amino acids on each residue in a protein. A large set of protein structures was collected and a multi-layer neural network was constructed. A number of structural properties were extracted as input features and the best network achieved an accuracy of 38.3%. Using the network output as residue type restraints improves the average sequence identity in designing three natural proteins using Rosetta. Moreover, the predictions from our network show ~3% higher sequence identity than a previous method. Results from this study may benefit further development of computational protein design methods.

Highlights

  • Proteins perform a vast number of functions in cells including signal transduction, DNA replication, catalyzing reactions, etc

  • As of July 2017, there are ~132,000 structures in the protein data bank (PDB)[29] with a yearly increase of ~10,000, but the number of unique folds has not changed in the past few years, suggesting more data are accumulated on each fold, and statistical learning and utilizing the existing structures are likely able to improve the design methods[30,31]

  • SPIN was trained on 1532 non-redundant proteins and reaches a sequence identity of 30.3% on a test set containing 500 proteins

Read more

Summary

Deep Learning Neural Networks

Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. Two statistical potentials for protein design have been developed[32,33], and the ABACUS potential[34] has been successfully used in designing proteins[33,35] While these statistical potentials have a physical basis, machine learning especially deep-learning neural network has recently become a popular method to analyze big data sets, extract complex features, and make accurate predictions[36]. We applied deep-learning neural networks in computational protein design using new structural features, new network architecture, and a larger protein structure data set, with the aim of improving the accuracy in protein design. The performance of the neural network on different input setups was compared, and application of the network outputs in protein design was investigated

Results
SPIN This study*
Methods
Additional Information
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call