Abstract

Central to studying the conformational changes of a complex protein is understanding the dynamics and energetics involved. Phenomenologically, structural dynamics can be formulated using an overdamped Langevin model along an observable, e.g., the distance between two residues in the protein. The Langevin model is specified by the deterministic force (the potential of mean force, PMF) and stochastic force (characterized by the diffusion coefficient, D). It is therefore of great interest to be able to extract both PMF and D from an observable time series but under the same computational framework. Here, we approach this challenge in molecular dynamics (MD) simulations by treating it as a missing-data Bayesian estimation problem. An important distinction in our methodology is that the entire MD trajectory, as opposed to the individual data elements, is used as the statistical variable in Bayesian imputation. This idea is implemented through an eigen-decomposition procedure for a time-symmetrized Fokker-Planck equation, followed by maximizing the likelihood for parameter estimation. The mathematical expressions for the functional derivatives used in learning PMF and D also provide new physical insights for the manner by which the information on both the deterministic and stochastic forces is encoded in the dynamics data. An all-atom MD simulation of a nontrivial biomolecule case is used to illustrate the application of this approach. We show that, interestingly, the results of trajectory statistical learning can motivate new order parameters for an improved description of the kinetic bottlenecks in conformational changes. Complementing purely data-driven or black-box methods, this work underscores the advantages of physics-based machine learning in gaining chemical insights from quantitative parameter estimation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call