Abstract

The Fast Fourier Transform (FFT) is one of the fundamental computational methods used in the fields of computational science and high-performance computing. Single-precision floating-point complex FFT itself is known as a memory bandwidth bottleneck and often becomes a bottleneck of application acceleration in these fields. We are researching and developing a parallel FFT on FPGA(s) to overcome this problem. In this paper, we discuss the memory bandwidth of the single-precision floating-point complex FFT on an FPGA. Our FFT implementation is based on a state-of-the-art OpenCL implementation provided by Intel. We first show that the computational performance of the FFT on Intel PAC D5005 is proportional to the effective memory bandwidth of the main memory. Then we propose a memory sub-system to improve the effective memory bandwidth. Specifically, a memory space partitioning and the sub-modules that access each memory space individually. In our FPGA design running at 270 MHz, two memory channels of DDR4-2400 memory are used for both reading and writing, respectively. Our proposed memory sub-system achieved an effective memory bandwidth of 22.57 [GB/s] (65.3% of the theoretical peak of this implementation) was achieved when the number of data points for FFT was 16,777,216.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.