Abstract
In recent years the computational capacity of single Field Programmable Gate Arrays (FPGA) devices as well as their versatility has increased significantly. Adding to that the High Level Synthesis frameworks allowing to program such processors in a high level language like C++, makes modern FPGA devices a serious candidate as building blocks of a general purpose High Performance Computing solution. In this contribution we describe benchmarks which we performed using a Lattice QCD code, a highly compute-demanding HPC academic code for elementary particle simulations. We benchmark the performance of a single FPGA device running in two modes: using the external or embedded memory. We discuss both approaches in detail using the Xilinx U250 device and provide estimates for the necessary memory throughput and the minimal amount of resources needed to deliver optimal performance depending on the available hardware platform.
Highlights
Quantum Chromodynamics is the theory describing the interactions of quarks and gluons, explaining why the latter form bound states such as protons and neutrons
We propose two implementations on the Field Programmable Gate Array (FPGA) devices which differ by the location where the main data is stored, either these are registers in the programmable logic, or an external DDR memory bank attached to the programmable logic
We showed that lattices up to the size of 12 × 83 data points in each direction in double precision can fit into the internal memory of the programmable logic of the FPGA devices available currently on the market
Summary
Quantum Chromodynamics is the theory describing the interactions of quarks and gluons, explaining why the latter form bound states such as protons and neutrons. In the discretized version of Quantum Chromodynamics the basic degrees of freedom are associated to each point of a four-dimensional grid representing a finite volume of four-dimensional space-time. One can exploit the structure of the SU (3) matrices and parametrize them in terms of 10 input words each, instead of 18 in the naive formulation (9 real and 9 imaginary entries) One of the simplest algorithms allowing to invert such a matrix is an iterative conjugate gradient algorithm The relevance of this algorithm is demonstrated by the fact that the HPCG benchmark was introduced since November 2017 as a new ranking of supercomputers published by the TOP500 organization.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have