Flexible-length fast Fourier transform for mapping onto single-instruction multiple-data computing architecture

K.J Jones

doi:10.1049/ip-vis:20050039

Abstract

A novel and flexible approach to the design and implementation of fast Fourier transforms (FFTs) of standard (fixed-radix) or non-standard length is described. By adopting a two-factor formulation, based upon either the prime factor algorithm or the common factor algorithm, where one factor, N1, is an integer power, and the other, N2, a small integer (odd- or even-valued), the (N1×N2)-point 1D discrete Fourier transform (DFT) can be carried out via the ‘row–column’ method with the design of just two generic DFT/FFT modules. Processing the small row-DFTs coefficient-by-coefficient, rather than DFT-by-DFT, it is seen how the usual requirement for matrix transposition of the row-DFT output can be eliminated. This reduces the processing requirement to a number of independent processes in which each process generates a set of small-DFT coefficients prior to computing a fixed-radix FFT of the resulting coefficient set. Hardware efficiency/simplicity is achieved by computing the small-DFT coefficients via a modified form of the Goertzel DFT (GDFT) filter, referred to here as the ‘dual-coefficient’ GDFT filter, which simultaneously computes two DFT coefficients. The resulting decomposition maps onto just ⌊½N2⌋+1 processing elements (PEs), via the single-instruction multiple-data computing architecture, in which each PE comprises two types of module: one corresponding to a dual-coefficient GDFT filter bank, the other to the associated dual-data FFTs. This yields flexible-length FFTs that lend themselves naturally to multi-level parallelisation, overcoming the communication problems associated with mapping sequentially optimal FFT algorithms onto multi-processor computing architectures.

Full Text