Advancing Large Scale Many-Body QMC Simulations on GPU Accelerated Multicore Systems

Andres Tomas,Richard Scalettar,Zhaojun Bai,Chia-Chen Chang

doi:10.1109/ipdps.2012.37

Andres Tomas, Richard Scalettar + Show 2 more

PDF Available

https://doi.org/10.1109/ipdps.2012.37

Copy DOI

Export

Save

Cite

Publication Date: May 1, 2012

Citations: 5

Affiliation: University of California, Davis

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The Determinant Quantum Monte Carlo (DQMC) method is one of the most powerful approaches for understanding properties of an important class of materials with strongly interacting electrons, including magnets and superconductors. It treats these interactions exactly, but the solution of a system of N electrons must be extrapolated to bulk values. Currently N 500 is state-of-the-art. Increasing N is required before DQMC can be used to model newly synthesized materials like functional multilayers. DQMC requires millions of linear algebra computations of order N matrices and scales as N3. DQMC cannot exploit parallel distributed memory computers efficiently due to limited scalability with the small matrix sizes and stringent procedures for numerical stability. Today, the combination of multisocket multicore processors and GPUs provides widely available platforms with new opportunities for DQMC parallelization. The kernel of DQMC, the calculation of the Green's function, involves long products of matrices. For numerical stability, these products must be computed using graded decompositions generated by the QR decomposition with column pivoting. The high communication overhead of pivoting limits parallel efficiency. In this paper, we propose a novel approach that exploits the progressive graded structure to reduce the communication costs of pivoting. We show that this method preserves the same numerical stability and achieves 70% performance of highly optimized DGEMM on a two-socket six-core Intel processor. We have integrated this new method and other parallelization techniques into QUEST, a modern DQMC simulation package. Using 36 hours on this Intel processor, we are able to compute accurately the magnetic properties and Fermi surface of a system of N = 1024 electrons. This simulation is almost an order of magnitude more difficult than N 500, owing to the N3 scaling. This increase in system size will allow, for the first time, the computation of the magnetic and transport properties of layered materials with DQMC. In addition, we show preliminary results which further accelerate DQMC simulations by using GPU processors.

Full Text