Abstract

The starting point of any lattice QCD computation is the generation of a Markov chain of gauge field configurations. Due to the large number of lattice links and due to the matrix multiplications, generating SU(Nc) lattice QCD configurations is a highly demanding computational task, requiring advanced computer parallel architectures such as clusters of several Central Processing Units (CPUs) or Graphics Processing Units (GPUs). In this paper we present and explore the performance of CUDA codes for NVIDIA GPUs to generate SU(Nc) lattice QCD pure gauge configurations. Our implementation in one GPU uses CUDA and in multiple GPUs uses OpenMP and CUDA. We present optimized CUDA codes for SU(2), SU(3) and SU(4). We also show a generic SU(Nc) code for Nc≥4 and compare it with the optimized version of SU(4). Our codes are publicly available for free use by the lattice QCD community.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call