Abstract

Recent advances in simulation and experiment have led to dramatic increases in the quantity and complexity of produced data, which makes the development of automated analysis tools very important. A powerful approach to analyze dynamics contained in such data sets is to describe/approximate it by diffusion on a free energy landscape - free energy as a function of reaction coordinates (RC). For the description to be quantitatively accurate, RCs should be chosen in an optimal way. Recent theoretical results show that such an optimal RC exists; however, determining it for practical systems is a very difficult unsolved problem. Here we describe a solution to this problem. We describe an adaptive nonparametric approach to accurately determine the optimal RC (the committor) for an equilibrium trajectory of a realistic system. In contrast to alternative approaches, which require a functional form with many parameters to approximate an RC and thus extensive expertise with the system, the suggested approach is nonparametric and can approximate any RC with high accuracy without system specific information. To avoid overfitting for a realistically sampled system, the approach performs RC optimization in an adaptive manner by focusing optimization on less optimized spatiotemporal regions of the RC. The power of the approach is illustrated on a long equilibrium atomistic folding simulation of HP35 protein. We have determined the optimal folding RC - the committor, which was confirmed by passing a stringent committor validation test. It allowed us to determine a first quantitatively accurate protein folding free energy landscape. We have confirmed the recent theoretical results that diffusion on such a free energy profile can be used to compute exactly the equilibrium flux, the mean first passage times, and the mean transition path times between any two points on the profile. We have shown that the mean squared displacement along the optimal RC grows linear with time as for simple diffusion. The free energy profile allowed us to obtain a direct rigorous estimate of the pre-exponential factor for the folding dynamics.

Highlights

  • Due to advances in computer hardware and simulation methodology, it is becoming increasingly easier to generate large simulation data sets of complex molecular systems, with a prominent example being the long equilibrium trajectories of fast folding proteins.[1,2] Because of the complexity of dynamics and high-dimensionality of the resulting trajectories, the generation of many trajectories per se is not sufficient to provide full scientific insight

  • Suboptimal spatiotemporal regions can be detected by using ZC,1(r, Δt) cut-profiles, an important quantity for reaction coordinates (RC) analysis, which can be straightforwardly computed from RC time-series r(kΔt0) and whose properties we briefly summarize below.[5,22]

  • Using the committor for the analysis and description of the folding dynamics may not be very convenient as the diffusion coefficient varies significantly along the coordinate D(q) = JAB/Peq(q) = ZH(q)−1NAB/Δt, where Peq(q) is the equilibrium probability or ZH(q) is the corresponding histogram density computed from q(kΔt0).[5,19,20,22]

Read more

Summary

Introduction

Due to advances in computer hardware and simulation methodology, it is becoming increasingly easier to generate large simulation data sets of complex molecular systems, with a prominent example being the long equilibrium trajectories of fast folding proteins.[1,2] Because of the complexity of dynamics and high-dimensionality of the resulting trajectories, the generation of many trajectories per se is not sufficient to provide full scientific insight. A fundamental way to analyze a simulation is to determine the underlying free energy landscape, i.e., the free energy as a function of one or more reaction coordinates (RCs), collective variables (CV), or order parameters.[1,5,7−10] Generally, one is interested in finding free energy minima or metastable states, pathways, transitions states (TS), and free energy barriers. The major difficulty in such an analysis is the selection of appropriate RCs. A poorly chosen RC can result in a misleadingly simple free energy landscape with missing minima and the absence or underestimation of barriers.[3,7] Experience has shown that RCs chosen based on intuition or using common methods such as principal component analysis (PCA) are usually suboptimal. A large number of methods have been suggested to determine good RCs or CVs in an Received: January 31, 2018 Published: May 23, 2018

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call