Abstract

The task of protein sequence design is central to nearly all rational protein engineering problems, and enormous effort has gone into the development of energy functions to guide design. Here, we investigate the capability of a deep neural network model to automate design of sequences onto protein backbones, having learned directly from crystal structure data and without any human-specified priors. The model generalizes to native topologies not seen during training, producing experimentally stable designs. We evaluate the generalizability of our method to a de novo TIM-barrel scaffold. The model produces novel sequences, and high-resolution crystal structures of two designs show excellent agreement with in silico models. Our findings demonstrate the tractability of an entirely learned method for protein sequence design.

Highlights

  • The task of protein sequence design is central to most rational protein engineering problems, and enormous effort has gone into the development of energy functions to guide design

  • The backbone is fully specified by the positions of each residue’s four N − Cα − C − O atoms and the C-terminal oxygen atom, whose positions are encoded as X 2 Rð4nþ1Þ 3; the final conditional distribution we are interested in modeling is: PðYjXÞ 1⁄4 pðyi1⁄41; 1⁄4 ; ynjXÞ

  • circular dichroism (CD) spectra for the top model designs match the native spectra well, and the designs were found to be more thermally stable than the native as well (Fig. 2J and Supplementary Fig. 13). These results indicate that the neural network model generalizes to topologies that are strictly unseen by the model during training

Read more

Summary

Introduction

The task of protein sequence design is central to most rational protein engineering problems, and enormous effort has gone into the development of energy functions to guide design. 1234567890():,; Computational protein design has emerged as a powerful tool for rational protein design, enabling significant achievements in the engineering of therapeutics[1,2,3], biosensors[4,5,6], enzymes[7,8], and more[9,10,11] Key to such successes is robust sequence design methods that minimize the folded-state energy of a pre-specified backbone conformation, which can either be derived from existing structures or generated de novo. We explore an approach for sequence design guided only by a neural network that explicitly models side-chain conformers in a structure-based context (Fig. 1A), and we assess its generalization to unseen native topologies and to a de novo TIM-barrel protein backbone. The model produces novel sequences, and the high-resolution crystal structures of two designs show excellent agreement with in silico models

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.