Abstract

Generating Markov models is a broadly used method to capture dominant processes in protein dynamics from molecular dynamics(MD) trajectories. In such models, dynamics are represented as a graph of transition rates (edges) between essential states (nodes) such that transitions are independent of previously visited states. Generating Markov models from MD trajectories is a challenging task: It not only requires a large amount of sampling but often special knowledge about the analyzed system to handpick suitable reaction coordinates. However, if many different systems need to be analyzed or suitable reaction coordinates are not at hand, an automated approach for Markov model generation is required. Against that background, here we asked to which extent the construction of Markov models from MD trajectories is reproducible, and -- related -- how much of the protein dynamics these models actually capture. We used standard protocols (e.g. tICA, k-Means clustering, maximum likelihood transition matrix estimation) and neural network approaches (e.g. VAMPnet) to analyze three independent 1μs MD trajectories of 200 small globular proteins, selected to cover known folds and functions, and compared the resulting timescales. Their reproducibility depends on the method used to generate Markov models. Neural network approaches yield better reproducibility compared to standard protocols. We identified the high dimensionality of configuration space as a major factor that limits reproducibility. To address the well-known curse of dimensionality, we use a minimally-coupled subspace approach (MCSA) that decomposes configuration space into lower-dimensional independent subspaces. In summary, our results demonstrate that the reproducibility of Markov models generated using currently available methods in an automated way can be improved significantly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call