In recent years, parallelism via multithreading has become extremely important to the optimization of high-performance electronic structure theory codes. Such multithreading is generally achieved via OpenMP constructs, using a fork-join threading model to enable thread-level data parallelism within the code. An alternative approach to multithreading is task-based parallelism, which displays multiple benefits relative to fork-join thread parallelism. A novel Restricted Hartree-Fock (RHF) algorithm, utilizing task-based parallelism to achieve optimal performance, was developed and implemented into the JuliaChem electronic structure theory software package. The new RHF algorithm utilizes a unique method of shell quartet batch creation, enabling construction and distribution of fine-grained shell quartet batches in a load-balanced manner using the Julia task construct. These shell quartet batches are then distributed statically across message-passing interface (MPI) ranks and dynamically across threads within an MPI rank, requiring no explicit inter-rank or interthread synchronization to do so. Compared to the hybrid MPI/OpenMP RHF algorithm present in the GAMESS software package, the task-based algorithm demonstrates speedups of up to ∼40% for systems in the S22(3) test set of molecules, with system sizes up to ∼1000 basis functions. The JuliaChem algorithm demonstrates the viability of both the task-based parallelism model and the Julia programming language for construction of performant electronic structure theory codes targeting systems of a size of chemical interest.
Read full abstract