Unsupervised Learning Methods for Molecular Simulation Data.

Aldo Glielmo,Cecilia Clementi,Alessandro Laio,Frank Noé,Alex Rodriguez,Brooke E Husic

doi:10.1021/acs.chemrev.0c01195

Abstract

Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.

Highlights

In recent years, we have witnessed a substantial expansion in the amount of data generated by molecular simulation
Throughout the review we present these techniques highlighting their specific application to the analysis of molecular dynamics, and discussing their advantages and disadvantages in this context
Time-lagged independent component analysis (TICA) has been leveraged to analyze a variety of biomolecular systems from both simulation and experimental data including the dynamics of protein folding,[252] disordered proteins,[317] protein−peptide, and protein−protein association,[29,255] protein conformational change and ligand binding,[318] binding-induced folding,[256] and kinase functional dynamics[257] TICA has been integrated into enhanced sampling algorithms.[319,320]

Summary

INTRODUCTION

We have witnessed a substantial expansion in the amount of data generated by molecular simulation. A striking example is given by the kinetics of complex conformational changes in biomolecules, which, on long time scales, can be well described by transition rates between a few discrete states Symmetries, such as the invariance of physical properties under translation, rotation, or permutation of equivalent particles, can be leveraged to obtain a more compact representation of simulation data. This set of approaches is qualitatively based on the requirement that a meaningful low-dimensional model should reproduce the relevant time-correlation properties of the original dynamics (e.g., the transition rates). Other valuable review articles of potential significance to the reader interested in machine learning for molecular and materials science are ref 5−9

FEATURE REPRESENTATION

Representations for Macromolecular Systems

Representations for Condensed Matter Systems

Representation Learning

DIMENSIONALITY REDUCTION AND MANIFOLD LEARNING

Linear Dimensionality Reduction Methods

Nonlinear Dimensionality Reduction

DENSITY ESTIMATION

Parametric Density Estimation

Nonparametric Density Estimation

CLUSTERING

Partitioning Schemes

Density-Based Clustering

KINETIC MODELS

Time-Lagged Independent Component Analysis

Variational Approach to Conformational Dynamics

Markov State Modeling

Koopman Models and VAMP

VAMPnets

Feature Representations

Dimensionality Reduction

Density Estimation

Clustering

CONCLUSION AND DISCUSSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Chemical reviews	Publication Date: May 4, 2021
Citations: 214	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Unsupervised Learning Methods for Molecular Simulation Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Chemical reviews

Lead the way for us

Similar Papers

Molecular simulation and macroscopic modeling of the diffusion of hydrogen, carbon monoxide and water in heavy n-alkane mixtures
Zoi A Makrodimitri ... Ioannis G Economou
Physical Chemistry Chemical Physics | VOL. 14
Zoi A Makrodimitri, et. al.Zoi A Makrodimitri ... Ioannis G Economou
01 Jan 2012
Physical Chemistry Chemical Physics | VOL. 14

Management of Molecular Simulation Database
Anand Kumar ... Sagar Pandit
Biophysical Journal | VOL. 106
Anand Kumar, et. al.Anand Kumar ... Sagar Pandit
01 Jan 2014
Biophysical Journal | VOL. 106

Adapting SAFT-γ perturbation theory to site-based molecular dynamics simulation. III. Molecules with partial charges at bulk phases, confined geometries and interfaces.
Ahmadreza F Ghobadi ... J Richard Elliott
The Journal of chemical physics | VOL. 141
Ahmadreza F Ghobadi, et. al.Ahmadreza F Ghobadi ... J Richard Elliott
05 Sep 2014
The Journal of chemical physics | VOL. 141

Comparison of equations of state for pure Lennard–Jones fluids and mixtures with molecular simulation data
Zhi-Ping Liu ... Jiu-Fang Lu
Fluid Phase Equilibria | VOL. 173
Zhi-Ping Liu, et. al.Zhi-Ping Liu ... Jiu-Fang Lu
01 May 2000
Fluid Phase Equilibria | VOL. 173

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised Learning Methods for Molecular Simulation Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Chemical reviews