Abstract

The limits of molecular dynamics (MD) simulations of macromolecules are steadily pushed forward by the relentless development of computer architectures and algorithms. The consequent explosion in the number and extent of MD trajectories induces the need for automated methods to rationalize the raw data and make quantitative sense of them. Recently, an algorithmic approach was introduced by some of us to identify the subset of a protein’s atoms, or mapping, that enables the most informative description of the system. This method relies on the computation, for a given reduced representation, of the associated mapping entropy, that is, a measure of the information loss due to such simplification; albeit relatively straightforward, this calculation can be time-consuming. Here, we describe the implementation of a deep learning approach aimed at accelerating the calculation of the mapping entropy. We rely on Deep Graph Networks, which provide extreme flexibility in handling structured input data and whose predictions prove to be accurate and-remarkably efficient. The trained network produces a speedup factor as large as 105 with respect to the algorithmic computation of the mapping entropy, enabling the reconstruction of its landscape by means of the Wang–Landau sampling scheme. Applications of this method reach much further than this, as the proposed pipeline is easily transferable to the computation of arbitrary properties of a molecular structure.

Highlights

  • Molecular dynamics (MD) simulations (Alder and Wainwright, 1959; Karplus, 2002) are an essential and extremely powerful tool in the computer-aided investigation of matter

  • In Mapping entropy we summarize the mapping entropy protocol for optimizing CG representations presented in Giulini et al (2020); in Protein structures and data sets we briefly describe the two proteins analyzed in this work as well as the data sets fed to the machine learning architecture; in Data Representation and Machine Learning model we illustrate our choice for the representation of the input data, together with theoretical and computational details about Deep Graph Networks (DGNs); in Wang–Landau Sampling we describe our implementation of the Wang–Landau sampling algorithm as applied to the reconstruction of the mapping entropy landscape of a system

  • We first analyze the results achieved by DGNs in predicting the mapping entropy associated to a choice of the CG representation of the two investigated proteins; we employ the R2 score as the main evaluation metric and the mean average error (MAE) as an additional measure to assess the quality of our model in fitting Smap data

Read more

Summary

Introduction

Molecular dynamics (MD) simulations (Alder and Wainwright, 1959; Karplus, 2002) are an essential and extremely powerful tool in the computer-aided investigation of matter. The usage of classical, allatom simulations has boosted our understanding of a boundless variety of different physical systems, ranging from materials (metals, alloys, fluids, etc.) to biological macromolecules such as proteins. The latest software and hardware developments have pushed the size of systems that MD simulations can address to the millions of atoms (Singharoy et al, 2019), and the time scales covered by a single run can approach the millisecond for relatively small molecules (Shaw et al, 2009). The simulation setup has to be constructed, which is another rather nontrivial step (Kandt et al, 2007). The output has to be analyzed and rationalized in order to extract information from the data

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call