A performance model for Fast Fourier Transform

Yan Li Yan Li,Li Zhao Li Zhao,Alex Chunghen Chow,Jeffrey R Diamond,Haibo Lin Haibo Lin

doi:10.1109/ipdps.2009.5160995

Abstract

The Fast Fourier Transform (FFT) has been considered one of the most important computing algorithms for decades. Its vast application domain makes it an important performance benchmark for new computer architectures. The most common Cooley-Tukey FFT algorithm factorizes a large FFT into a combination of smaller ones. The choice of factors and the order in which they are applied are critical to the ultimate performance of the large FFT. Traditional hand coded FFT libraries can immediately execute a given sized FFT applying constant heuristics to different kernel sizes, but are not always optimal. FFTW is a popular auto tuning FFT library which searches over the possible factorizations and empirically determines one with the best performance. This search method produces FFT kernels for a given size that are competitive with hand tuned libraries. Unfortunately, the search process for a large size takes hours on real hardware, and is completely infeasible to use when evaluating the FFT performance of new hardware which is still in the simulation phase. It is also less than ideal in environments where it is desirable to have a rapid response to a new sized FFT. This paper introduces a novel performance model that allows the FFT performance of a given data size to be estimated to within 2% error without ever running the actual FFT. In addition, by recognizing more sophisticated patterns within the computation, this model reduces the search tree size from a permutation of the number of factors to a combination. Because typical FFT sizes contain a large number of similar factors, this effectively reduces the search by an order of magnitude. When given a set of computational kernels, this model can completely characterize the performance of a chosen target architecture by just running some short performance tests on each sized kernel, a process which takes a few minutes or less. Once characterized, an optimal FFT plan for a given input size can be determined in milliseconds instead of hours. In this paper, we first derive our mathematical model. We then validate its accuracy by using it to improve the performance of a state of the art, hand tuned FFT library by 30%. Finally, we demonstrate its effectiveness by replacing FFTWs own planning stage with our model, resulting in the same FFT performance using FFTWs own kernels in as little as one millionth the computation time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A performance model for Fast Fourier Transform

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A hybrid GPU/CPU FFT library for large FFT problems
Shuo Chen ... Xiaoming Li
-
Shuo Chen, et. al.Shuo Chen ... Xiaoming Li
01 Dec 2013
01 Dec 2013

An Efficient Shuffle-Light FFT Library
Salvatore Servodio ... Xiaoming Li
-
Salvatore Servodio, et. al.Salvatore Servodio ... Xiaoming Li
29 Oct 2021
29 Oct 2021

Research on FFT Algorithm Use SMP System
Bingfeng Qian ... Yize Sun
International Journal of Information and Communication Sciences | VOL. 5
Bingfeng Qian, et. al.Bingfeng Qian ... Yize Sun
01 Jan 2020
International Journal of Information and Communication Sciences | VOL. 5

Research on the realization and optimization of FFTs in ARMv8 platform
Qi Du ... Hui Huang
IOP Conference Series: Materials Science and Engineering | VOL. 768
Qi Du, et. al.Qi Du ... Hui Huang
01 Mar 2020
IOP Conference Series: Materials Science and Engineering | VOL. 768

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A performance model for Fast Fourier Transform

Abstract

Talk to us

Similar Papers