We present DFT-FE 1.0, building on DFT-FE 0.6 Motamarri et al. (2020) [28], to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching ∼100,000 electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation—via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency—as well high-performance computing aspects, including the GPU acceleration of all the key compute kernels in DFT-FE. We demonstrate the accuracy by comparing the ground-state energies, ionic forces and cell stresses on a wide-range of benchmark systems against those obtained from widely used DFT codes. Further, we demonstrate the numerical efficiency of our GPU acceleration, which yields ∼20× speed-up on hybrid CPU-GPU nodes of the Summit supercomputer. Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of 80−140 seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing ∼6,000−15,000 electrons using 64−224 nodes of the Summit supercomputer. Program summaryProgram Title: DFT-FECPC Library link to program files:https://doi.org/10.17632/c5ghfc6ctn.1Developer's repository link:https://github.com/dftfeDevelopers/dftfeLicensing provisions: LGPL v3Programming language: C/C++External routines/libraries: p4est (http://www.p4est.org/), deal.II (https://www.dealii.org/), BLAS (http://www.netlib.org/blas/), LAPACK (http://www.netlib.org/lapack/), ELPA (https://elpa.mpcdf.mpg.de/), ScaLAPACK (http://www.netlib.org/scalapack/), Spglib (https://atztogo.github.io/spglib/), ALGLIB (http://www.alglib.net/), LIBXC (http://www.tddft.org/programs/libxc/), PETSc (https://www.mcs.anl.gov/petsc), SLEPc (http://slepc.upv.es), NCCL (optional-https://github.com/NVIDIA/nccl).Nature of problem: Density functional theory calculations.Solution method: We employ a local real-space variational formulation of Kohn-Sham density functional theory that is applicable for both pseudopotential and all-electron calculations on periodic, semi-periodic and non-periodic geometries. Higher-order adaptive spectral finite-element basis is used to discretize the Kohn-Sham equations. Chebyshev polynomial filtered subspace iteration procedure (ChFSI) is employed to solve the nonlinear Kohn-Sham eigenvalue problem self-consistently. ChFSI in DFT-FE employs Cholesky factorization based orthonormalization, and spectrum splitting based Rayleigh-Ritz procedure in conjunction with mixed precision arithmetic. Configurational force approach is used to compute ionic forces and periodic cell stresses for geometry optimization.Additional comments including restrictions and unusual features: Exchange correlation functionals are restricted to Local Density Approximation (LDA) and Generalized Gradient Approximation (GGA), with and without spin. The pseudopotentials available are optimized norm conserving Vanderbilt (ONCV) pseudopotentials and Troullier–Martins (TM) pseudopotentials. Calculations are non-relativistic.DFT-FE handles all-electron and pseudopotential calculations in the same framework, while accommodating periodic, non-periodic and semi-periodic boundary conditions.
Read full abstract