Markov State Models (MSMs) have been widely applied to understand protein folding mechanisms by predicting long time scale dynamics from ensembles of short molecular simulations. Most MSM estimators enforce detailed balance, assuming that trajectory data are sampled at an equilibrium. This is rarely the case for ab initio folding studies, however, and as a result, MSMs can severely underestimate protein folding stabilities from such data. To remedy this problem, we have developed an enhanced-sampling protocol in which (1) unbiased folding simulations are performed and sparse tICA is used to obtain features that best capture the slowest events in folding, (2) umbrella sampling along this reaction coordinate is performed to observe folding and unfolding transitions, and (3) the thermodynamics and kinetics of folding are estimated using multiensemble Markov models (MEMMs). Using this protocol, folding pathways, rates, and stabilities of a designed α-helical hairpin, Z34C, can be predicted in good agreement with experimental measurements. These results indicate that accurate simulation-based estimates of absolute folding stabilities are within reach, with implications for the computational design of folded miniproteins and peptidomimetics.
Read full abstract