Abstract
The design of novel proteins has many applications but remains an attritional process with success in isolated cases. Meanwhile, deep learning technologies have exploded in popularity in recent years and are increasingly applicable to biology due to the rise in available data. We attempt to link protein design and deep learning by using variational autoencoders to generate protein sequences conditioned on desired properties. Potential copper and calcium binding sites are added to non-metal binding proteins without human intervention and compared to a hidden Markov model. In another use case, a grammar of protein structures is developed and used to produce sequences for a novel protein topology. One candidate structure is found to be stable by molecular dynamics simulation. The ability of our model to confine the vast search space of protein sequences and to scale easily has the potential to assist in a variety of protein design tasks.
Highlights
The computational design and redesign of proteins provides a route to create new protein structures and functions[1,2]
In order to carry out protein design tasks we developed a conditional variational autoencoders (CVAE) that is able to generate protein sequences with certain properties
At its simplest the model is able to act as an encoder and decoder of protein sequences, with a mean sequence identity between training set sequences and their encoded-decoded form of 49.5% for the model conditioned on metal binding sites
Summary
The computational design and redesign of proteins provides a route to create new protein structures and functions[1,2]. A range of machine learning techniques have been applied to site prediction in the past decade[12,13,14,15] they are not prevalent within the design task outside of computationally validating designed sequences before experimental characterization Another task within the realm of protein design is designing entirely new protein topologies. Typical workflows for the design of novel metalloproteins, and novel protein topologies, consist of designing a complex template followed by computational selection of the best designs before experimental validation. Another field that has seen recent development is that of deep learning research An aspect of this development has been the rise of powerful new generative methods[18,19,20,21] that leverage deep architectures in order to learn complex distributions[22,23]. The generative mechanism can be used to sample complex synthetic data, and the inference www.nature.com/scientificreports/
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.