Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs

Douglas Doerfler,Marco Siracusa,Farzad Fatollahi-Fard,Colin Maclean,Samuel Williams,Tan Nguyen,Nicholas Wright

doi:10.1145/3456669.3456671

Abstract

In this study we investigate the implications of porting a common computational kernel used in high performance computing, which has been optimized for efficient execution on general purpose graphics processing units (GPUs), to a field programmable gate array (FPGA). In particular, we use a benchmark based on a matrix-matrix multiply kernel commonly used in lattice quantum chromodynamics applications. The microbenchmark is based on the OpenCL programming language. We evaluate the performance, and portability, aspects associated for two FPGAs, the Intel Arria 10 and the Xilinx Alveo U280. The purpose of the study is not to compare the two FPGAs, but to evaluate their respective OpenCL toolchains and to evaluate the level of effort needed to port a GPU optimized code to a FPGA, and the effectiveness of the respective toolchains. We did find the toolchains to be relatively easy to use, and it was possible to get correctness with little effort, but there was significant effort needed to get relatively good performance. We found that FPGAs perform best when using single work item kernels, as opposed to the nominal multiple work item NDRange kernel used for CPUs and GPUs. In addition, other source code changes were necessary, and in particular the lack of a local cache in FPGA architectures can require a significant rewrite of the code. The performance achieved with the Intel Arria 10 was 47.6% of its maximum sustained bandwidth, while the Xilinx Alveo U280 achieved 35.2%. GPU architectures have been shown to demonstrate 75% to 90% architectural efficiencies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Modern Parallel Architectures to Speed Up Numerical Simulation
Mikhail Lavrentiev ... Mikhail Shadrin
-
Mikhail Lavrentiev, et. al.Mikhail Lavrentiev ... Mikhail Shadrin
01 Jan 2019
01 Jan 2019

Gate density advantage of parallel-operation-oriented FPGA architecture
Takumi Fujimori ... Minora Watanabe
-
Takumi Fujimori, et. al.Takumi Fujimori ... Minora Watanabe
01 Jun 2017
01 Jun 2017

Recent Results on the Implementation of a Burst Error and Burst Erasure Channel Emulator Using an FPGA Architecture
Caterina Travan ... Fulvio Babich
Journal of communications software and systems | VOL. 16
Caterina Travan, et. al.Caterina Travan ... Fulvio Babich
15 Mar 2020
Journal of communications software and systems | VOL. 16

Comparison of Baseband Processors in Terms of Realization SDR-Transceivers
Maksym Serhiyovych Holub
Electronic and Acoustic Engineering | VOL. 3
Maksym Serhiyovych HolubMaksym Serhiyovych Holub
30 Jun 2020
Electronic and Acoustic Engineering | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Experiences Porting the SU3_Bench Microbenchmark to the Intel Arria 10 and Xilinx Alveo U280 FPGAs

Abstract

Talk to us

Similar Papers