Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

Dipayan Datta,Mark S Gordon

doi:10.1021/acs.jctc.3c00876

Abstract

An algorithm is presented for the coupled-cluster singles, doubles, and perturbative triples correction [CCSD(T)] method based on the density fitting or the resolution-of-the-identity (RI) approximation for performing calculations on heterogeneous computing platforms composed of multicore CPUs and graphics processing units (GPUs). The directive-based approach to GPU offloading offered by the OpenMP application programming interface has been employed to adapt the most compute-intensive terms in the RI-CCSD amplitude equations with computational costs scaling as , , and (where NO and NV denote the numbers of correlated occupied and virtual orbitals, respectively) and the perturbative triples correction to execute on GPU architectures. The pertinent tensor contractions are performed using an accelerated math library such as cuBLAS or hipBLAS. Optimal strategies are discussed for splitting large data arrays into tiles to fit them into the relatively small memory space of the GPUs, while also minimizing the low-bandwidth CPU-GPU data transfers. The performance of the hybrid CPU-GPU RI-CCSD(T) code is demonstrated on pre-exascale supercomputers composed of heterogeneous nodes equipped with NVIDIA Tesla V100 and A100 GPUs and on the world's first exascale supercomputer named "Frontier", the nodes of which consist of AMD MI250X GPUs. Speedups within the range 4-8× relative to the recently reported CPU-only algorithm are obtained for the GPU-offloaded terms in the RI-CCSD amplitude equations. Applications to polycyclic aromatic hydrocarbons containing 16-66 carbon atoms demonstrate that the acceleration of the hybrid CPU-GPU code for the perturbative triples correction relative to the CPU-only code increases with the molecule size, attaining a speedup of 5.7× for the largest circumovalene molecule (C66H20). The GPU-offloaded code enables the computation of the perturbative triples correction for the C60 molecule using the cc-pVDZ/aug-cc-pVTZ-RI basis sets in 7 min on Frontier when using 12,288 AMD GPUs with a parallel efficiency of 83.1%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Theory and Computation

Lead the way for us

Journal: Journal of Chemical Theory and Computation	Publication Date: Oct 25, 2023
Citations: 3

Similar Papers

Portability for GPU-accelerated molecular docking applications for cloud and HPC: can portable compiler directives provide performance across all platforms?
Mathialakan Thavappiragasam ... Wael Elwasif
-
Mathialakan Thavappiragasam, et. al.Mathialakan Thavappiragasam ... Wael Elwasif
01 May 2022
01 May 2022

Accelerating genetic algorithms with GPU computing: A selective overview
John Runwei Cheng ... Mitsuo Gen
Computers & Industrial Engineering | VOL. 128
John Runwei Cheng, et. al.John Runwei Cheng ... Mitsuo Gen
29 Dec 2018
Computers & Industrial Engineering | VOL. 128

A Massively Parallel Reservoir Simulator on the GPU Architecture
Maitham Alhubail ... Thomas Byer
-
Maitham Alhubail, et. al.Maitham Alhubail ... Thomas Byer
19 Oct 2021
19 Oct 2021

Architecture-Aware Mapping and Optimization on a 1600-Core GPU
Mayank Daga ... Wu-Chun Feng
-
Mayank Daga, et. al.Mayank Daga ... Wu-Chun Feng
01 Dec 2011
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
Mayank Daga ... Wu-Chun Feng

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Theory and Computation