Abstract

Row-column algorithm is commonly used for 2-D FFT implementation on FPGA. However, in this algorithm, the strided memory access to external memory such as DRAM introduces significant delay for DRAM row activation, thus resulting in high DRAM energy and a significant amount of FPGA device energy consumed in idle state. In this paper, to optimize energy consumption of the 2-D FFT architecture, we employ an FPGA-based 1-D FFT kernel supporting processing streaming data to fully utilize the available bandwidth offered by the external memory , and balance the I/O bandwidth between the DRAM and FPGA to minimize the FPGA idle time. Furthermore, to avoid time consuming DRAM row activation, we decompose the required transposition by column-wise FFTs into smaller size problems, thus enabling on-chip local transposition which could be performed by the customized data permutation unit used in the 1-D FFT kernel. Compared with the baseline 2-D FFT architecture, the optimized architecture achieves 3.9×, 4.2× and 4.5× improvement in energy efficiency for 1024×1024, 4096 × 4096 and 8192 × 8192 points 2-D FFTs, respectively. We also estimate the peak energy efficiency of the FPGA-based 2-D FFT architecture. Our estimation shows that our optimized 2-D FFT Kernel can achieve 8.06 ∼ 8.31 GFLOPS/W for various 2-D FFTs, ie., up to 62% of the peak energy efficiency of 2-D FFT architecture on FPGA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call