Interpretable embeddings from molecular simulations using Gaussian mixture variational autoencoders

Yasemin Bozkurt Varolgüneş,Tristan Bereau,Joseph F Rudzinski

doi:10.1088/2632-2153/ab80b7

Yasemin Bozkurt Varolgüneş, Tristan Bereau + Show 1 more

Open Access

https://doi.org/10.1088/2632-2153/ab80b7

Copy DOI

Abstract

Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics.

Highlights

Particle-based computer simulations can provide unprecedented mechanistic insight into the driving forces of complex molecular systems, in contexts ranging from biochemistry to materials science [1, 2, 3]
In the case of hierarchical input data, we show that the Gaussian mixture variational autoencoder (GMVAE) makes a reasonable prediction for the number of clusters, independent of the given hyperparameter, based on the dimensionality of the latent space and characteristics of the data
The resulting Gaussian mixture Variational autoencoders (VAEs) (GMVAE) adopts the physics-based viewpoint that an optimal embedding of the simulation data should give rise to a free-energy landscape (FEL) with well-separated clusters of configurations, which correspond to metastable states that are separated by large barriers along the high-dimensional potential energy landscape

Summary

Introduction

Particle-based computer simulations can provide unprecedented mechanistic insight into the driving forces of complex molecular systems, in contexts ranging from biochemistry to materials science [1, 2, 3]. The autoencoder aims at discovering a latent space (embedding) that faithfully describes the essential features of the high-dimensional input data This makes autoencoders well suited for constructing low-dimensional FELs from molecular simulation data [22, 23, 24]. The autoencoder-based approaches were recently extended to explicitly incorporate the temporal nature of the data via a time-lag in the network architecture [27, 28] These time-lagged autoencoders aim to retain information about the slowest dynamical modes sampled in the underlying simulation trajectory and, as a consequence, may encourage metastable clustering in the latent space. In contrast to recent deep neural-network approaches that aim to directly model the propagator of the system’s dynamics [31, 32], the construction of MSMs from the learned FEL offers a different strategy: explicitly testing to what extent a representation appropriate for the statics is directly amenable for the dynamics

Autoencoder

Gaussian Mixture Variational Autoencoder

Determination of Cluster Labels and Thresholding Scheme

GMVAE Architecture and Training Hyperparameters

Markov State Models

Peptide Analysis

Results

One-dimensional 4-well Potential

Müller-Brown Potential

Alanine Dipeptide

AAQAA3 Peptide - I

AAQAA3 Peptide - II

Discussion and Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Learning: Science and Technology	Publication Date: Mar 1, 2020
Citations: 29	License type: cc-by

R Discovery Prime

R Discovery Prime

Interpretable embeddings from molecular simulations using Gaussian mixture variational autoencoders

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning: Science and Technology

Lead the way for us

Similar Papers

Deep variational auto-encoders: A promising tool for dimensionality reduction and ball bearing elements fault diagnosis
Gabriel San Martin ... Enrique López Droguett
Structural Health Monitoring | VOL. 18
Gabriel San Martin, et. al.Gabriel San Martin ... Enrique López Droguett
20 Jul 2018
Structural Health Monitoring | VOL. 18

Charting molecular free-energy landscapes with an atlas of collective variables.
Behrooz Hashemian ... Marino Arroyo
The Journal of Chemical Physics | VOL. 145
Behrooz Hashemian, et. al.Behrooz Hashemian ... Marino Arroyo
03 Nov 2016
The Journal of Chemical Physics | VOL. 145

Unveiling interatomic distances influencing the reaction coordinates in alanine dipeptide isomerization: An explainable deep learning approach.
Kazushi Okada ... Kang Kim
The Journal of Chemical Physics | VOL. 160
Kazushi Okada, et. al.Kazushi Okada ... Kang Kim
02 May 2024
The Journal of Chemical Physics | VOL. 160

Efficient sampling of high-dimensional free energy landscapes using adaptive reinforced dynamics.
Dongdong Wang ... Yanze Wang
Nature Computational Science | VOL. 2
Dongdong Wang, et. al.Dongdong Wang ... Yanze Wang
24 Dec 2021
Nature Computational Science | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interpretable embeddings from molecular simulations using Gaussian mixture variational autoencoders

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning: Science and Technology