Abstract
An ongoing challenge in protein chemistry is to identify the underlying interaction energies that capture protein dynamics. The traditional trade-off in biomolecular simulation between accuracy and computational efficiency is predicated on the assumption that detailed force fields are typically well-parameterized, obtaining a significant fraction of possible accuracy. We re-examine this trade-off in the more realistic regime in which parameterization is a greater source of error than the level of detail in the force field. To address parameterization of coarse-grained force fields, we use the contrastive divergence technique from machine learning to train from simulations of 450 proteins. In our procedure, the computational efficiency of the model enables high accuracy through the precise tuning of the Boltzmann ensemble. This method is applied to our recently developed Upside model, where the free energy for side chains is rapidly calculated at every time-step, allowing for a smooth energy landscape without steric rattling of the side chains. After this contrastive divergence training, the model is able to de novo fold proteins up to 100 residues on a single core in days. This improved Upside model provides a starting point both for investigation of folding dynamics and as an inexpensive Bayesian prior for protein physics that can be integrated with additional experimental or bioinformatic data.
Highlights
Since Anfinsen’s original demonstration that a protein’s sequence determines its structure, multiple computational strategies have been developed to predict a protein’s structure from its sequence
Allatom, explicit solvent methods have become successful for the folding of some small proteins, the ability to replicate the properties outside the native basin requires substantial improvement [4]
We demonstrate that we can achieve de novo folding for a diverse collection of proteins by combining our fast-equilibrating Upside model with a contrastive divergence procedure that optimizes the stability of the native well
Summary
Since Anfinsen’s original demonstration that a protein’s sequence determines its structure, multiple computational strategies have been developed to predict a protein’s structure from its sequence. An additional facet of this challenge is to replicate the energy landscape that defines both the folding process and other dynamical properties. In the absence of other information, coarse-grained models with one or a few beads per residue are too simplistic for de novo structure prediction. Cβ level models having authentic protein backbones with φ/ψ dihedral angles, but lacking side chain rotamers, have achieved some success [1,2,3]. Allatom, explicit solvent methods have become successful for the folding of some small proteins, the ability to replicate the properties outside the native basin requires substantial improvement [4]. It is unclear which representation provides the optimal combination of detail and computational expense to replicate protein folding and dynamics. Integral to the choice of representation is which interactions to include, such as hydrogen bonding, van der Waals interactions and hydrophobic burial
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.