A new version of the highly parallelized general-purpose molecular dynamics (MD) simulation program MODYLAS with high performance on the Fugaku computer was developed. A benchmark test using Fugaku indicated highly efficient communication, single instruction, multiple data (SIMD) processing, and on-cache arithmetic operations. The system's performance deteriorated only slightly, even under high parallelization. In particular, a newly developed minimum transferred data method, requiring a significantly lower amount of data transfer compared to conventional communications, showed significantly high performance. The coordinates and forces of 101 810 176 atoms and the multipole coefficients of the subcells could be distributed to the 32 768 nodes (1 572 864 cores) in 2.3ms during one MD step calculation. The SIMD effective instruction rates for floating-point arithmetic operations in direct force and fast multipole method (FMM) calculations measured on Fugaku were 78.7% and 31.5%, respectively. The development of a data reuse algorithm enhanced the on-cache processing; the cache miss rate for direct force and FMM calculations was only 2.74% and 1.43%, respectively, on the L1 cache and 0.08% and 0.60%, respectively, on the L2 cache. The modified MODYLAS could complete one MD single time-step calculation within 8.5ms for the aforementioned large system. Additionally, the program contains numerous functions for material research that enable free energy calculations, along with the generation of various ensembles and molecular constraints.
Read full abstract