Transfer Learning from Markov Models Leads to Efficient Sampling of Related Systems.

Mohammad M Sultan,Vijay S Pande

doi:10.1021/acs.jpcb.7b06896

Mohammad M Sultan, Vijay S Pande

Open Access

https://doi.org/10.1021/acs.jpcb.7b06896

Copy DOI

Journal: Journal of Physical Chemistry B	Publication Date: Sep 22, 2017
Citations: 15	License type: cc-by

Affiliation: Stanford University

Abstract

We recently showed that the time-structure-based independent component analysis method from Markov state model literature provided a set of variationally optimal slow collective variables for metadynamics (tICA-metadynamics). In this paper, we extend the methodology toward efficient sampling of related mutants by borrowing ideas from transfer learning methods in machine learning. Our method explicitly assumes that a similar set of slow modes and metastable states is found in both the wild type (baseline) and its mutants. Under this assumption, we describe a few simple techniques using sequence mapping for transferring the slow modes and structural information contained in the wild type simulation to a mutant model for performing enhanced sampling. The resulting simulations can then be reweighted onto the full-phase space using the multistate Bennett acceptance ratio, allowing for thermodynamic comparison against the wild type. We first benchmark our methodology by recapturing alanine dipeptide dynamics across a range of different atomistic force fields, including the polarizable Amoeba force field, after learning a set of slow modes using Amber ff99sb-ILDN. We next extend the method by including structural data from the wild type simulation and apply the technique to recapturing the effects of the GTT mutation on the FIP35 WW domain.

Highlights

Efficient sampling of protein configuration space remains an unsolved problem in computational biophysics
We propose transferring information from the wild type (WT)’s tICA model and MSM (Markov state model) to the mutant Metadynamics or Umbrella sampling simulations (Figure 1). tICA is a dimensionality reduction technique[17,18,19,20] capable of finding reaction coordinates(tICs) within the dataset. are kinetic models of protein dynamics that model the dynamics as memory-less jump processes. tICA was initially used as a dimensionality reduction process[20] for defining the Markov models’ state space though it was later shown that both tICA and MSM solve the same problem[21] of approximating the underlying transfer operator, albeit with a differing choice of basis
The exact parameters for the well-tempered Metadynamics runs are given in SI table 1, though we empirically found that a range of parameters worked

Summary

Introduction

Efficient sampling of protein configuration space remains an unsolved problem in computational biophysics. Transferable tICA-Metadynamics can use Wild type simulation’s structural data by coupling to a MSM structural reservoir Up to this point, our modeling efforts have only focused on using the slow tICs within the WT simulation for efficiently sampling the mutant. This might be sufficient for small peptides systems but is unlikely to work for large systems due to for example missing structural features in the construction of our tICA coordinates. The latter might involve starting off with a ‘partially’ constructed free energy-landscape such that the Metadynamics engine only has to fill in the regions that are different between the WT and the mutant

Discussion and Conclusion

Findings

References: