A memory bandwidth improvement with memory space partitioning for single-precision floating-point FFT on Stratix 10 FPGA

Takaaki Miyajima,Kentaro Sano

doi:10.1109/cluster48925.2021.00117

Takaaki Miyajima, Kentaro Sano

https://doi.org/10.1109/cluster48925.2021.00117

Copy DOI

Export

Save

Cite

Publication Date: Sep 1, 2021

Citations: 2

Affiliation: Meiji University

Abstract
Full-Text
Similar Papers

Abstract

Listen

The Fast Fourier Transform (FFT) is one of the fundamental computational methods used in the fields of computational science and high-performance computing. Single-precision floating-point complex FFT itself is known as a memory bandwidth bottleneck and often becomes a bottleneck of application acceleration in these fields. We are researching and developing a parallel FFT on FPGA(s) to overcome this problem. In this paper, we discuss the memory bandwidth of the single-precision floating-point complex FFT on an FPGA. Our FFT implementation is based on a state-of-the-art OpenCL implementation provided by Intel. We first show that the computational performance of the FFT on Intel PAC D5005 is proportional to the effective memory bandwidth of the main memory. Then we propose a memory sub-system to improve the effective memory bandwidth. Specifically, a memory space partitioning and the sub-modules that access each memory space individually. In our FPGA design running at 270 MHz, two memory channels of DDR4-2400 memory are used for both reading and writing, respectively. Our proposed memory sub-system achieved an effective memory bandwidth of 22.57 [GB/s] (65.3% of the theoretical peak of this implementation) was achieved when the number of data points for FFT was 16,777,216.

Full Text