Artificial neural network potential (ANNP), obtained by training a large database by the first-principles calculations, has become popular in molecular dynamics (MD) simulation since it can capture accurate physical and chemical properties. However, the complex procedure and heavy data dependence during implementation make the performance of CPU-only runs worse, which further limits its application. In this contribution, we report a flexible computation method for ANNP in LAMMPS, in which the simulation box is divided into several parts in accordance with the resource on the accelerator such as the size of global memory and the number of work items (cores). The number of dividing parts has little influence on the performance when the number of calculated atoms per loop is larger than the number of work items on the device. In this approach, the forces of neighbor atoms are updated using hierarchical memory without atomic operation. Typical dynamic and static tests are performed to validate the implementation. The results show that our approach is 12 or 13 times faster when using one graphics processing unit (GPU) compared with 8-MPI tasks CPU-only runs. Additionally, this implementation is supported for CUDA- and OpenCL-enabled GPU cards.
Read full abstract