Abstract

The increasing success of High Performance Scalable Computing has brought with it enhanced capabilities in numerical modeling using spectral methods and large image processing. These types of algorithms require portable and reliable fast fourier transforms, or ffts. Unfortunately, many of the parallel ffts now available suffer from not being portable and not being as flexible as desired. In order to fill this need, Numerical Algorithms Group, Ltd., has ported their serial/vector ffts C06FUF and C06FXF to distributed memory architectures, with the communication enabled by the Basic Linear Algebra Communication Subroutines, or BLACS. The parallel ffts C06FUFP arid C06FXFP are portable and reliable, in addition to scaling well and being as flexible as their serial counterparts. This paper will present a discussion of the parallelization algorithm used to distribute these ffts, focussing on the global array transposition algorithm implemented using the BLACS pointto-point communication calls DGESD2D and DGERV2D. The performance of the ffts will be presented, including observed measurements of the scala-

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call