Abstract

We present a parallel implementation of the ReaxFF force field on massively parallel heterogeneous architectures, called PuReMD-Hybrid. PuReMD, on which this work is based, along with its integration into LAMMPS, is currently used by a large number of research groups worldwide. Accelerating this important community codebase that implements a complex reactive force field poses a number of algorithmic, design, and optimization challenges, as we discuss in detail. In particular, different computational kernels are best suited to different computing substrates-CPUs or GPUs. Scheduling these computations requires complex resource management, as well as minimizing data movement across CPUs and GPUs. Integrating powerful nodes, each with multiple CPUs and GPUs, into clusters and utilizing the immense compute power of these clusters requires significant optimizations for minimizing communication and, potentially, redundant computations. From a programming model perspective, PuReMD-Hybrid relies on MPI across nodes, pthreads across cores, and CUDA on the GPUs to address these challenges. Using a variety of innovative algorithms and optimizations, we demonstrate that our code can achieve over 565-fold speedup compared to a single core implementation on a cluster of 36 state-of-the-art GPUs for complex systems. In terms of application performance, our code enables simulations of over 1.8M atoms in under 0.68 seconds per simulation time step.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call