Molecular dynamics (MD) simulations are immensely valuable for studying protein structure, function and dynamics. Their ability to capture atomic-level behavior of molecules and describe their evolution over time makes it a powerful synergistic tool for biochemistry, structural biology and other life sciences. To advance research and knowledge on reasonable timescales, researchers must optimize the amount of useful information extracted from simulation data while often frugally managing computational resources. Often, this involves balancing the length of MD trajectories with the number of replicas of a given system, with the aim of maximizing sampling of the conformational landscape. However, identifying this balance is not always intuitive, and the lack of standards among researchers can produce large variability in results and predictions from MD measurements. Here, we investigate the variability in MD results when simulation length and replica numbers are varied. Using a 231-amino acid domain, we compare measurements from independent trajectories to a benchmark trajectory of 3, 1000-ns replicates. We perform these simulations on 27 protein-ligand complexes, allowing us to compare ligand-specific rankings of complexes across independent replicas. Our results reveal that some MD measurements are accurately ranked by single trajectories, while others are not. We uncover similar variability in the effects of trajectory lengths on measurements. Our findings suggest that a one-size-fits-all approach to MD simulations is not necessarily the best approach, and depending on the intended measurements and research question, it may be advantageous sometimes to prioritize longer trajectories over multiple replicas. This work provides important considerations for researchers while designing simulation studies.
Read full abstract