Abstract

Molecular simulation trajectories represent high-dimensional data. Such data can be visualized by methods of dimensionality reduction. Non-linear dimensionality reduction methods are likely to be more efficient than linear ones due to the fact that motions of atoms are non-linear. Here we test a popular non-linear t-distributed Stochastic Neighbor Embedding (t-SNE) method on analysis of trajectories of 200 ns alanine dipeptide dynamics and 208 μs Trp-cage folding and unfolding. Furthermore, we introduce a time-lagged variant of t-SNE in order to focus on rarely occurring transitions in the molecular system. This time-lagged t-SNE efficiently separates states according to distance in time. Using this method it is possible to visualize key states of studied systems (e.g., unfolded and folded protein) as well as possible kinetic traps using a two-dimensional plot. Time-lagged t-SNE is a visualization method and other applications, such as clustering and free energy modeling, must be done with caution.

Highlights

  • The main goal of molecular simulations is identification of key states of studied systems and building thermodynamic and kinetic models of transitions between these states

  • Identification of key states is often based on some numerical descriptors known as collective variables

  • Collective variables are dimensionality reduction methods because they represent high dimensional structure of a molecular system using few numerical descriptors

Read more

Summary

INTRODUCTION

The main goal of molecular simulations is identification of key states of studied systems and building thermodynamic and kinetic models of transitions between these states. Advantage of non-linear dimensionality reduction methods is their ability to describe more variance in data compared to linear methods with the same number of dimensions This is especially true for t-distributed Stochastic Neighbor Embedding (t-SNE) (van der Maaten and Hinton, 2008). The second advantage of t-SNE lies in the fact that it unifies density of low-dimensional points in the output space This feature, which can be controlled by a parameter called perplexity, makes visual representation of points more effective. Forces focus on local structure of the input data whereas larger perplexity (e.g., 50) takes more global structure into the account As discussed later, this feature improves visualization by t-SNE but at the same time it complicates application in situations when preservation of densities is required. The method was tested on two molecular trajectories—on 200 ns simulation of alanine dipeptide and 208.8 μs simulation of Trp-cage miniprotein folding and unfolding (trajectory kindly provided by DE Shaw Research) (Lindorff-Larsen et al, 2011)

METHODS
Alanine Dipeptide
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call