A Framework for Low-Communication 1-D FFT

Ping Tak Peter Tang,Daehyun Kim,Vladimir Petrov,Jongsoo Park

doi:10.1155/2013/672424

Abstract

In high-performance computing on distributed-memory systems, communication often represents a significant part of the overall execution time. The relative cost of communication will certainly continue to rise as compute-density growth follows the current technology and industry trends. Design of lower-communication alternatives to fundamental computational algorithms has become an important field of research. For distributed 1-D FFT, communication cost has hitherto remained high as all industry-standard implementations perform three all-to-all internode data exchanges (also called global transposes). These communication steps indeed dominate execution time. In this paper, we present a mathematical framework from which many single-all-to-all and easy-to-implement 1-D FFT algorithms can be derived. For large-scale problems, our implementation can be twice as fast as leading FFT libraries on state-of-the-art computer clusters. Moreover, our framework allows tradeoff between accuracy and performance, further boosting performance if reduced accuracy is acceptable.

Highlights

IntroductionThe works in [25,27], for example, do not count the communication cost incurred when each processor accesses the entire input data or reorders out-of-order results back into natural order
There are many FFT algorithms, they all factor the Discrete Fourier Transform (DFT) matrix algebraically into sparse factors, thereby reducing an O(N 2) arithmetic cost to that of O(N log N ). This arithmetic cost reduction has been instrumental in many advances in computing since the Cooley–Tukey paper appeared
While arithmetic cost reduction has long been a paradigm in computer science, a new paradigm is emerging as we enter a new era where raw arithmetic speed reaches a teraflop on a single die and parallelism in the form of multicore and multinode is prevalent

Summary

Introduction

The works in [25,27], for example, do not count the communication cost incurred when each processor accesses the entire input data or reorders out-of-order results back into natural order They would require O(N 3/2) computation cost as opposed to O(N log N ). Task computes P sets of length-M FFTs: As depicted above, this decomposition fundamentally requires three all-to-all steps if data order is to be preserved. Recursive applications of this (or variants thereof) decomposition leads to an O(N log N ) arithmetic complexity, but cannot undo the triple-all-to-all requirement. While internode communication is required during convolution, that amount is negligible as each node merely needs an insignificant amount of data from its nextdoor neighbor

Overview

Computing segment of interest – Theory

Computing segment of interest – Practice

Low-communication FFT framework

Implementation

Evaluation

Performance – Full accuracy

Performance – Accuracy tradeoff

Performance analysis and projection

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Jan 1, 2013
Citations: 1	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

A Framework for Low-Communication 1-D FFT

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

A framework for low-communication 1-D FFT
Ping Tak Peter Tang ... Vladimir Petrov
-
Ping Tak Peter Tang, et. al.Ping Tak Peter Tang ... Vladimir Petrov
01 Nov 2012
01 Nov 2012

A framework for low-communication 1-D FFT
...
-
, et. al. ...
10 Nov 2012
10 Nov 2012

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross–Pitaevskii equation
Vladimir Lončar ... Antun Balaž
Computer Physics Communications | VOL. 209
Vladimir Lončar, et. al.Vladimir Lončar ... Antun Balaž
06 Sep 2016
Computer Physics Communications | VOL. 209

Catalytic processing of plastic waste on the rise
Antonio J Martín ... Javier Pérez-Ramírez
Chem | VOL. 7
Antonio J Martín, et. al.Antonio J Martín ... Javier Pérez-Ramírez
04 Jan 2021
Chem | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Framework for Low-Communication 1-D FFT

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming