A force-matching-based method for supervised machine learning (ML) of coarse-grained (CG) free energy (FE) potentials─known as multiscale coarse-graining via force-matching (MSCG/FM)─is an efficient method to develop microscopically informed CG models that are thermodynamically and statistically equivalent to the reference microscopic models. For low-resolution models, when the coarse-graining is at supramolecular scales, objective-oriented clustering of nonbonded particles is required and the reduced description becomes a function of the clustering algorithm. In the present work, we explore the dependence of the ML of the CG Helmholtz FE potential on the clustering algorithm. We consider coarse-graining based on partitional (k-means, leading to Voronoi diagram) and hierarchical agglomerative (bottom-up) clustering algorithms common in unsupervised ML and develop theory connecting the MSCG/FM learned CG Helmholtz potential and the clustering statistics. By combining the agglomerative clustering and the MSCG/FM learning in a recursive manner, we propose an efficient ML methodology to develop the fine-to-low resolution hierarchies of the CG models. The methodology does not suffer from degrading accuracy or increased computational cost to construct larger hierarchies and as such does not impose an upper size limitation of the CG particles resulting from the extended hierarchies. The utility of the methodology is demonstrated by obtaining the bottom-up agglomerative hierarchy for liquid nitromethane from all-atom molecular dynamics (MD) simulations. For agglomerative hierarchies, we prove the existence of renormalization group transformations that indicate self-similarity and allow for learning the low-resolution MSCG/FM potentials at low computational cost by rescaling and renormalizing the certain finer-resolution members of the hierarchy. The hierarchies of the CG models can be used to carry out simulations under constant-pressure conditions.
Read full abstract