The presence of many degenerate d/f orbitals makes polynuclear transition-metal compounds, such as iron-sulfur clusters in nitrogenase, challenging for state-of-the-art quantum chemistry methods. To address this challenge, we present the first distributed multi-graphics processing unit (GPU) ab initio density matrix renormalization group (DMRG) algorithm suitable for modern high-performance computing (HPC) infrastructures. The central idea is to parallelize the most computationally intensive part─the multiplication of O(K2) operators with a trial wave function, where K is the number of spatial orbitals, by combining operator parallelism for distributing the workload with a batched algorithm for performing contractions on GPU. With this new implementation, we are able to reach an unprecedentedly large bond dimension D = 14,000 on 48 GPUs (NVIDIA A100 80 GB SXM) for an active space model (114 electrons in 73 active orbitals) of the P-cluster, which is nearly 3 times larger than the bond dimensions reported in previous DMRG calculations for the same system using only central processing units (CPUs).
Read full abstract