Abstract

Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics.

Highlights

  • Particle-based computer simulations can provide unprecedented mechanistic insight into the driving forces of complex molecular systems, in contexts ranging from biochemistry to materials science [1, 2, 3]

  • In the case of hierarchical input data, we show that the Gaussian mixture variational autoencoder (GMVAE) makes a reasonable prediction for the number of clusters, independent of the given hyperparameter, based on the dimensionality of the latent space and characteristics of the data

  • The resulting Gaussian mixture Variational autoencoders (VAEs) (GMVAE) adopts the physics-based viewpoint that an optimal embedding of the simulation data should give rise to a free-energy landscape (FEL) with well-separated clusters of configurations, which correspond to metastable states that are separated by large barriers along the high-dimensional potential energy landscape

Read more

Summary

Introduction

Particle-based computer simulations can provide unprecedented mechanistic insight into the driving forces of complex molecular systems, in contexts ranging from biochemistry to materials science [1, 2, 3]. The autoencoder aims at discovering a latent space (embedding) that faithfully describes the essential features of the high-dimensional input data This makes autoencoders well suited for constructing low-dimensional FELs from molecular simulation data [22, 23, 24]. The autoencoder-based approaches were recently extended to explicitly incorporate the temporal nature of the data via a time-lag in the network architecture [27, 28] These time-lagged autoencoders aim to retain information about the slowest dynamical modes sampled in the underlying simulation trajectory and, as a consequence, may encourage metastable clustering in the latent space. In contrast to recent deep neural-network approaches that aim to directly model the propagator of the system’s dynamics [31, 32], the construction of MSMs from the learned FEL offers a different strategy: explicitly testing to what extent a representation appropriate for the statics is directly amenable for the dynamics

Autoencoder
Gaussian Mixture Variational Autoencoder
Determination of Cluster Labels and Thresholding Scheme
GMVAE Architecture and Training Hyperparameters
Markov State Models
Peptide Analysis
Results
One-dimensional 4-well Potential
Müller-Brown Potential
Alanine Dipeptide
AAQAA3 Peptide - I
AAQAA3 Peptide - II
Discussion and Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.