Abstract
In this paper, an empirical comparison is made between two parallel implementations of a one-dimensional Fast Fourier transform (FFT) that is targeted for a symmetric multiprocessor (SMP). The paper compares the run time characteristics and overhead (time complexity) associated with the two algorithms with that of previous research. The scalability of the two algorithms is also accessed using the isoefficiency function and the effect of caches on performance is presented. The isoefficiency function is defined as the rate at which the data should be increased with the number of processors to maintain constant efficiency. The two implementations are based on a tree and transpose, respectively. In the tree algorithm, the speedup does not increase linearly with the number of processors, but rather super linear speedup can be achieved for the two processor case. The transpose algorithm obtained (approximately) linearly speedup with respect to the number of processors with only moderate increase in the data size. Additional performance can be obtained by overlapping computation with communication and by efficient use of caches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.