Abstract

Computing the 1-D Fast Fourier Transform (FFT) using the conventional six-step FFT algorithm on parallel computers requires intensive all-to-all communication due to the necessity of matrix transpose in three steps. This all-to-all communication is a limiting factor in improving the performance of FFT in its parallel implementations. In this paper, we present two parallel algorithms for implementing the 1-D FFT without all-to-all communication between processors, at the expense of increased inner-processor computation as compared to the conventional six-step FFT algorithm. Our analysis reveals the advantage of these two algorithms over the six-step FFT algorithm in parallel systems where the cost of inter-processor communication outweighs the cost of inner-processor computation. As a case study, we choose a 32-node Beowulf cluster with fast processors (running at 2 GHz) but relatively slow inter-processor communication (over a 100 Mbit/s switch). Simulation results on this cluster demonstrate that the proposed no-communication FFT algorithms can achieve a speedup ranging from 1.1 to 1.5 over the six-step FFT algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call