Performance Analysis of Parallel FFT on Large Multi-GPU Systems

Alan Ayala,Jack Dongarra,Azzam Haidar,Stan Tomov,Miroslav Stoyanov

doi:10.1109/ipdpsw55747.2022.00072

Abstract

In this paper we present a performance study of multidimensional Fast Fourier Transforms (FFT) with GPU accelerators on modern hybrid architectures, as those expected for upcoming exascale systems. We assess and leverage features from traditional implementations of parallel FFTs and provide an algorithm that encompasses a wide range of their parameters, and adds novel developments such as FFT grid shrinking and batched transforms. Next, we create a bandwidth model to quantify the computational costs and analyze the well-known communication bottleneck for All-to-All and Point-to-Point MPI exchanges. Then, using a tuning methodology, we are able to accelerate the FFT computation and reduce the communication cost, achieving linear scalability on a large-scale system with GPU accelerators. Finally, our performance analysis is extended to show that carefully tuning the algorithm can further accelerate applications heavily relying on FFTs, such is the case of molecular dynamics software. Our experiments were performed on Summit and Spock supercomputers with IBM Power9 cores, over 3000 NVIDIA V-100 GPUs, and AMD MI-100 GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance Analysis of Parallel FFT on Large Multi-GPU Systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Hybrid and 4-D FFT implementations of an open-source parallel FFT package OpenFFT
Truong Vinh Truong Duy ... Taisuke Ozaki
The Journal of Supercomputing | VOL. 72
Truong Vinh Truong Duy, et. al.Truong Vinh Truong Duy ... Taisuke Ozaki
14 Dec 2015
The Journal of Supercomputing | VOL. 72

A family of MD FFT algorithms of complexity intermediate between the MD Cooley-Tukey FFT and the MD prime-factor FFT
R Bernardini ... G Cortelazzo
-
R Bernardini, et. al.R Bernardini ... G Cortelazzo
03 May 1993
03 May 1993

PARALLEL PIPELINED MULTI RADIX VARIABLE LENGTH FAST FOURIER TRANSFORM ARCHITECTURE

Journal of critical reviews | VOL. 7

01 Apr 2020
Journal of critical reviews | VOL. 7

Design of An Approximate FFT Processor Based on Approximate Complex Multipliers
Jinhe Du ... Chenggang Yan
-
Jinhe Du, et. al.Jinhe Du ... Chenggang Yan
01 Jul 2021
01 Jul 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Analysis of Parallel FFT on Large Multi-GPU Systems

Abstract

Talk to us

Similar Papers