Abstract

Discrete Fourier Transform (DFT) is one of the core operations in digital signal processing and communication systems. Many fundamental algorithms can be realized by DFT, such as convolution, spectrum estimation, and correlation. Furthermore, DFT is widely used in standard embedded system applications such as wireless communication protocols requiring Orthogonal Frequency Division Multiplexing (Wey et al., 2007), and radar image processing using Synthetic Aperture Radar (Fanucci et al., 1999). In practice, DFT is difficult to implement directly due to its computational complexity. To reduce the degree of computation, Cooley and Tukey proposed the well-known Fast Fourier Transform (FFT) algorithm, which reduces the calculation of N-point DFT from O(N2) to O(N/2log2N). (Proakis & Manolakis, 2006). Nevertheless, for embedded systems, in particular portable devices; efficient hardware realization of FFT with small area, low-power dissipation and real-time computation is a significant challenge. The challenge is even more pronounced when FFTs with large transform lengths (>1024 points) need to be realized in embedded hardware. Therefore, the objective of this research is to investigate hardware efficient FFT architectures, emphasizing compact, low-power embedded realizations. As VLSI technology evolves, different architectures have been proposed for improving the performance and efficiency of the FFT hardware. Pipelined architectures are widely used in FFT realization (Li & Wanhammar, 1999; He & Torkelson, 1996; Hopkinson & Butler, 1992; Yang et al., 2006) due to their speed advantages. Higher radix (Hopkinson & Butler, 1992; Yang et al., 2006) and multi-butterfly (Bouguezel et al., 2004; X. Li et al., 2007) structures can also improve the performance of the FFT processor significantly, but these structures require substantially more hardware resources. Alternatively, shared memory based schemes with a single butterfly calculation unit (Cohen, 1976; Ma, 1994, 1999; Ma & Wanhammar, 2000; Wang et al., 2007) are preferred in many embedded FFT processors since they require least amount of hardware resources. Furthermore, “in-place” addressing strategy is a practical choice to minimize the amount of data memory. With “in-place” strategy, the two outputs of the butterfly unit can be written back to the same memory locations of the two inputs, and replace the old data. For in-place FFT processing, two data read and two data write operations occur at every clock cycle. Multiple memory banks and conflict-free addressing logic are required to realize four data accesses in one clock cycle. Consequently, a typical FFT processor is composed of three major components: i) butterfly calculation units, ii) conflict free address generators for both data and coefficient accesses and iii) multi-bank memory units.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call