Abstract

Abstract The Fast Fourier Transform (FFT) is a widely used algorithm that is frequently employed in environments where high performance is critical. In the context of embedded systems, FFTs often have hard runtime constraints and must be evaluated using limited hardware. In this paper, we present a partitioned FFT algorithm (PFFTC) for the Cell Broadband Engine (Cell BE) that improves upon previous FFT implementations for this platform. PFFTC has three main phases to (i) partition the problem into independent sub-problems, (ii) solve the sub-problems in parallel, and (iii) combine the results of the sub-problems to obtain the solution to the original problem. PFFTC includes optimizations for exploiting data transfer parallelism, avoiding unnecessary communication through careful data routing, avoiding data dependency stalls with instruction-level double buffering, and minimizing synchronization overhead through the use of an “synchronous” signal-based barrier. We evaluate the performance of PFFTC and other FFT algorithms for the Cell BE. Our results indicate that PFFTC attains a peak processing rate of 33.6 GFLOPS, and achieves speedups ranging from 31% to 62% over the fastest previous Cell BE FFT algorithm’s reported performance for complex single-precision FFTs with 1,024-16,384 data points.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call