We introduce novel algorithmic solutions for hybrid CPU-multiGPU tensor network state algorithms utilizing non-Abelian symmetries building on AI-motivated state-of-the-art hardware and software technologies. The presented numerical simulations on the FeMo cofactor, which plays a crucial role in converting atmospheric nitrogen to ammonia, are far beyond the scope of traditional approaches. Our large-scale SU(2) spin adapted density matrix renormalization group calculations up to bond dimension D = 216 on complete active space (CAS) size of 18 electrons in 18 orbitals [CAS(18, 18)] demonstrate that the current limit of exact solution, i.e. full-CI limit, can be achieved in fraction of time. Furthermore, benchmarks up to CAS(113, 76) demonstrate the utilization of NVIDIA's highly specialized AI accelerators via NVIDIA Tensor Cores, leading to performance around 115 TFLOPS on a single node supplied with eight NVIDIA A100 devices. As a consequence of reaching 71% of the full capacity of the hardware, the cubic scaling of computational time with bond dimension can be reduced to a linear form for a broad range of D values; thus, breaking the current computational limits of small CAS spaces in ab initio quantum chemistry and material science is becoming a reality. In comparison to strict U(1) implementations with matching accuracy, our solution has an estimated effective performance of 300-500 TFLOPS, which emphasizes the mutual need for both algorithmic and technological developments to push current frontiers on classical computation.
Read full abstract