Abstract

The numerical integration of the exchange–correlation (XC) potential is one of the primary computational bottlenecks in Gaussian basis set Kohn–Sham density functional theory (KS-DFT). To achieve optimal performance and accuracy, care must be taken in this numerical integration to preserve local sparsity as to allow for near linear weak scaling with system size. This leads to an integration scheme with several performance critical kernels which must be hand optimized for each architecture of interest. As the set of available accelerator hardware goes more diverse, a key challenge for developers of KS-DFT software is to maintain performance portability across a wide range of computational architectures. In this work, we examine a modular software design pattern which decouples the implementation details of performance critical kernels from the expression of high-level algorithmic workflows in a device-agnostic language such as C++; thus allowing for developers to target existing and emerging accelerator hardware within a single code base. We consider the efficacy of such a design pattern in the numerical integration of the XC potential by demonstrating its ability to achieve performance portability across a set of accelerator architectures which are representative of those on current and future U.S. Department of Energy Leadership Computing Facilities.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call