Abstract

A new class of fast Fourier transform (FFT) architecture, based on the use of distributed memories, is proposed for field-programmable gate arrays (FPGAs). Prominent features are high clock speeds, programmability, reduced look-up-table (LUT) and register usage, simplicity of design, and a capability to do both power-of-two and non-power-of-two FFTs. Higher clock speeds are a consequence of new algorithms and a more fine-grained structure compared to traditional pipelined FFTs, so clock speeds are typically >500 MHz in 65 nm FPGA technology. The programmability derives from the memory-based architecture, which is also scalable. Reduced LUT and register usage arises from a unique methodology to control word growth during computation that achieves high dynamic range, along with inherent systolic circuit characteristics: simple, regular, uniform arrays of processing elements, connected in nearest-neighbor fashion to minimize wiring lengths. The circuit goal was to maximize throughput and minimize the use of the FPGA LUT and register logic fabric. Comparison results from seven different designs, covering a spectrum of functionality (fixed-size, variable, floating-point and variable non-power-of-two FFTs), different FPGA vendors (Intel and Xilinx) and different FPGA types, showed increases in throughput per logic cell up to 181% with an average improvement of 94%.

Highlights

  • The discrete Fourier transform (DFT) is one of the most prominent signal processing algorithms and is used in a variety of applications within engineering, computer science, physics, and mathematics [1,2].Since many of these applications are real-time or involve computations on large data sets, special purpose parallel circuitry coupled with fast Fourier transform (FFT) algorithms for reducing DFT computation times, is essential

  • If a DFT can be factored into a product of small numbers, the basic idea is for the distributed-memory-based architecture (DMBA) to sequentially perform an appropriate series of transforms on these to produce the DFT output

  • Seven different field-programmable gate arrays (FPGAs) FFT implementations are described, with the purpose of demonstrating how the same architecture can be used for a range of applications

Read more

Summary

Introduction

SC-FDMA is a part of the LTE protocol [3] used for up-link data transmission. It involves a DFT pre-coding of the transmitted signal, where the DFT can be any one of 35 transform sizes from 12-points to 1296-points, with N = 2a 3b 5c and a, b, c positive integers. The rationale for targeting FPGAs is due to the rapidly growing FPGA use in communications applications, e.g., base stations and remote radio heads at the top of cell phone towers. We provide results of mapping the DMBA to Xilinx

FPGA Implementations
Related Work
Algorithm
Base-b Algorithm
Matrix–Matrix Systolic Array
Architecture
Column DFTs
Row DFTs
Reachable Transform Sizes
Physical Array
Programmability
Dynamic Range
Method
Floating-Point without FPGA Embedded Hardware Support
Floating Point with Embedded Hardware Support
DMBA Design Approach
On-the-Fly-Twiddle Coefficient Calculation
DMBA LTE SC-FDMA Transform Throughput and Latency
Comparison with Commercial Circuits
Other FPGA LTE Implementations
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call