Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS.

Szilárd Páll,Mark Abraham,Berk Hess,Paul Bauer,Artem Zhmurov,Alan Gray,Magnus Lundborg,Erik Lindahl

doi:10.1063/5.0018516

Abstract

The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching, and cutoffs. Here, we present the heterogeneous parallelization and acceleration design of molecular dynamics implemented in the GROMACS codebase over the last decade. The setup involves a general cluster-based approach to pair lists and non-bonded pair interactions that utilizes both GPU and central processing unit (CPU) single instruction, multiple data acceleration efficiently, including the ability to load-balance tasks between CPUs and GPUs. The algorithm work efficiency is tuned for each type of hardware, and to use accelerators more efficiently, we introduce dual pair lists with rolling pruning updates. Combined with new direct GPU-GPU communication and GPU integration, this enables excellent performance from single GPU simulations through strong scaling across multiple GPUs and efficient multi-node parallelization.

Highlights

Molecular dynamics (MD) simulation has had tremendous success in a number of application areas in the past two decades, in part due to hardware improvements that have enabled studies of systems and timescales that were previously not feasible
The original Message Passing Interface (MPI)- ( PVM) based scaling was less impressive, but in version 4.0,8 this was replaced with a state-of-the-art neutral-territory domain-decomposition27 combined with fully flexible 3D dynamic load balancing (DLB) of triclinic domains. This is combined with a high-level task decomposition that dedicates a subset of MPI ranks to long-range Particle Mesh Ewald (PME) electrostatics to reduce the cost of collective communication required by the 3D FFTs, which means multiple-program, multipledata (MPMD) parallelization
On the central processing unit (CPU) front, SIMD parallelism is used for most major time-consuming parts of the code. This was necessitated by Amdahl’s law: as the performance of non-bonded kernels and PME improved, previously insignificant components such as integration turned into new bottlenecks. This was made fully portable by the introduction of the GROMACS SIMD abstraction layer, which started as the replacement of raw assembly with intrinsics and supports a range of CPU architectures using 14 different SIMD instruction sets,28 with additional ones in development

Summary

INTRODUCTION

Molecular dynamics (MD) simulation has had tremendous success in a number of application areas in the past two decades, in part due to hardware improvements that have enabled studies of systems and timescales that were previously not feasible. By employing state-ofthe-art algorithms and efficient parallel implementations, the code is able to target hardware and efficiently parallelize from the lowest level of SIMD (single instruction, multiple data) vector units to multiple cores and caches, accelerators, and distributed-memory HPC resources We believe that this approach makes great use of limited compute resources to improve research productivity, and it is increasingly enabling higher absolute performance on any given resource. While there has been some convergence of architectures, the difference between latencyand throughput-optimized functional units is fundamental, and utilizing each of them for the tasks at which they are best suited requires heterogeneous parallelization This typically employs the CPU for scheduling work, transferring data, and launching computation on the accelerator, as well as inter- and intra-node communication.

COMPUTATIONAL CHALLENGES IN MD SIMULATIONS

The structure of the MD algorithm

Multi-level parallelism

HETEROGENEOUS PARALLELIZATION

Offloading force computation

Offloading complete MD iterations

The cluster pair algorithm

Non-bonded pair interaction kernel throughput

The pair list generation algorithm

Dual pair list with dynamic pruning

Multi-level load balancing

Benchmark systems

Findings

DISCUSSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Journal of Chemical Physics	Publication Date: Oct 5, 2020
Citations: 317	License type: cc-by

R Discovery Prime

R Discovery Prime

Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of Chemical Physics

Lead the way for us

Similar Papers

Graphics Processing Unit Acceleration and Parallelization of GENESIS for Large-Scale Molecular Dynamics Simulations.
Jaewoon Jung ... Chigusa Kobayashi
Journal of Chemical Theory and Computation | VOL. 12
Jaewoon Jung, et. al.Jaewoon Jung ... Chigusa Kobayashi
27 Sep 2016
Journal of Chemical Theory and Computation | VOL. 12

Transforming molecular biology research through extreme acceleration of AMBER molecular dynamics simulations
Ross C Walker ... Levi Pierce
-
Ross C Walker, et. al.Ross C Walker ... Levi Pierce
16 Jul 2012
16 Jul 2012

Molecular dynamics simulations with many-body potentials on multiple GPUs—The implementation, package and performance
Qing Hou ... Jun Wang
Computer Physics Communications | VOL. 184
Qing Hou, et. al.Qing Hou ... Jun Wang
24 Apr 2013
Computer Physics Communications | VOL. 184

General Purpose Computation on Graphics Processing Units Using OpenCL

-

01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of Chemical Physics