Abstract

A novel implementation of the self-consistent field (SCF) procedure specifically designed for high-performance execution on multiple graphics processing units (GPUs) is presented. The algorithm offloads to GPUs the three major computational stages of the SCF, namely, the calculation of one-electron integrals, the calculation and digestion of electron repulsion integrals, and the diagonalization of the Fock matrix, including SCF acceleration via DIIS. Performance results for a variety of test molecules and basis sets show remarkable speedups with respect to the state-of-the-art parallel GAMESS CPU code and relative to other widely used GPU codes for both single and multi-GPU execution. The new code outperforms all existing multi-GPU implementations when using eight V100 GPUs, with speedups relative to Terachem ranging from 1.2× to 3.3× and speedups of up to 28× over QUICK on one GPU and 15× using eight GPUs. Strong scaling calculations show nearly ideal scalability up to 8 GPUs while retaining high parallel efficiency for up to 18 GPUs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call