Quantifying convergence and sufficient sampling of macromolecular molecular dynamics simulations is more often than not a source of controversy (and of various ad hoc solutions) in the field. Clearly, the only reasonable, consistent, and satisfying way to infer convergence (or otherwise) of a molecular dynamics trajectory must be based on probability theory. Ideally, the question we would wish to answer is the following: "What is the probability that a molecular configuration important for the analysis in hand has not yet been observed ?". Here we propose a method for answering a variant of this question by using the Good-Turing formalism for frequency estimation of unobserved species in a sample. Although several approaches may be followed in order to deal with the problem of discretizing the configurational space, for this work we use the classical RMSD matrix as a means to answering the following question: "What is the probability that a molecular configuration with an RMSD (from all other already observed configurations) higher than a given threshold has not actually been observed ?". We apply the proposed method to several different trajectories and show that the procedure appears to be both computationally stable and internally consistent. A free, open-source program implementing these ideas is immediately available for download via public repositories.
Read full abstract