Abstract

Biomolecular simulations are intrinsically high dimensional and generate noisy data sets of ever-increasing size. Extracting important features from the data is crucial for understanding the biophysical properties of molecular processes, but remains a big challenge. Machine learning (ML) provides powerful dimensionality reduction tools. However, such methods are often criticized as resembling black boxes with limited human-interpretable insight. We use methods from supervised and unsupervised ML to efficiently create interpretable maps of important features from molecular simulations. We benchmark the performance of several methods, including neural networks, random forests, and principal component analysis, using a toy model with properties reminiscent of macromolecular behavior. We then analyze three diverse biological processes: conformational changes within the soluble protein calmodulin, ligand binding to a G protein-coupled receptor, and activation of an ion channel voltage-sensor domain, unraveling features critical for signal transduction, ligand binding, and voltage sensing. This work demonstrates the usefulness of ML in understanding biomolecular states and demystifying complex simulations.

Highlights

  • Molecular dynamics (MD) simulations of biological systems provide a unique atomistic insight into many important biological processes, such as a protein’s conformational change between functional states, the folding of a soluble protein, or the effect of ligand binding to a receptor

  • We have demonstrated how to learn ensemble properties from molecular simulations and provide interpretable metrics of important features with prominent Machine learning (ML) methods of varying complexity, including principal component analysis (PCA), random forests (RFs), and three types of neural networks (NNs): autoencoders (AEs), restricted Boltzmann machines (RBMs), and multilayer perceptrons (MLPs)

  • We constructed a toy model that mimics real macromolecular behavior to perform a quantitative comparison between the methods and derive insights regarding their applicability for practical purposes

Read more

Summary

Introduction

Molecular dynamics (MD) simulations of biological systems provide a unique atomistic insight into many important biological processes, such as a protein’s conformational change between functional states, the folding of a soluble protein, or the effect of ligand binding to a receptor. These systems can be extremely high dimensional, with pairwise interactions between tens to hundreds of thousands of atoms at every snapshot in time. The system typically moves on a manifold of much lower dimensionality than its actual number of degrees of freedom. If the aim is to enhance sampling of transitions from one functional state to another, we typically seek the CVs that best describe the slowest motion

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call