Recent advances have made all-atom molecular dynamics (MD) a powerful tool to sample the conformational energy landscape. There are still however three major challenges in the application of MD to biological systems: accuracy of force field, time scale, and the analysis of simulation trajectories. Significant progress in addressing the first two challenges has been made and extensively reviewed previously. This Account focuses on strategies of analyzing simulation data of biomolecules that also covers ways to properly design simulations and validate simulation results. In particular, we examine an approach named comparative perturbed-ensembles analysis, which we developed to efficiently detect dynamics in protein MD simulations that can be linked to biological functions. In our recent studies, we implemented this approach to understand allosteric regulations in several disease-associated human proteins. The central task of a comparative perturbed-ensembles analysis is to compare two or more conformational ensembles of a system generated by MD simulations under distinct perturbation conditions. Perturbations can be different sequence variations, ligand-binding conditions, and other physical/chemical modifications of the system. Each simulation is long enough (e.g., microsecond-long) to ensure sufficient sampling of the local substate. Then, sophisticated bioinformatic and statistical tools are applied to extract function-related information from the simulation data, including principal component analysis, residue-residue contact analysis, difference contact network analysis (dCNA) based on the graph theory, and statistical analysis of side-chain conformations. Computational findings are further validated with experimental data. By comparing distinct conformational ensembles, functional micro- to millisecond dynamics can be inferred. In contrast, such a time scale is difficult to reach in a single simulation; even when reached for a single condition of a system, it is elusive as to what dynamical motions are related to functions without, for example, comparing free and substrate-bound proteins at the minimum. We illustrate our approach with three examples. First, we discuss using the approach to identify allosteric pathways in cyclophilin A (CypA), a member of a ubiquitous class of peptidyl-prolyl cis-trans isomerase enzymes. By comparing side-chain torsion-angle distributions of CypA in wild-type and mutant forms, we identified three pathways: two are consistent with recent nuclear magnetic resonance experiments, whereas the third is a novel pathway. Second, we show how the approach enables a dynamical-evolution analysis of the human cyclophilin family. In the analysis, both conserved and divergent conformational dynamics across three cyclophilin isoforms (CypA, CypD, and CypE) were summarized. The conserved dynamics led to the discovery of allosteric networks resembling those found in CypA. A residue wise determinant underlying the unique dynamics in CypD was also detected and validated with additional mutational MD simulations. In the third example, we applied the approach to elucidate a peptide sequence-dependent allosteric mechanism in human Pin 1, a phosphorylation-dependent peptidyl-prolyl isomerase. We finally present our outlook of future directions. Especially, we envisage how the approach could help open a new avenue in drug discovery.
Read full abstract