Designing and auto-tuning parallel 3-D FFT for computation-communication overlap

Sukhyun Song,Jeffrey K Hollingsworth

doi:10.1145/2692916.2555249

Designing and auto-tuning parallel 3-D FFT for computation-communication overlap

Sukhyun Song, Jeffrey K Hollingsworth

https://doi.org/10.1145/2692916.2555249

Copy DOI

Journal: ACM SIGPLAN Notices	Publication Date: Feb 6, 2014
Citations: 4

Affiliation: University of Maryland, College Park

#Improve Cache Performance #Computation-communication Overlap + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

This paper presents a method to design and auto-tune a new parallel 3-D FFT code using the non-blocking MPI all-to-all operation. We achieve high performance by optimizing computation-communication overlap. Our code performs fully asynchronous communication without any support from special hardware. We also improve cache performance through loop tiling. To cope with the complex trade-off regarding our optimization techniques, we parameterize our code and auto-tune the parameters efficiently in a large parameter space. Experimental results from two systems confirm that our code achieves a speedup of up to 1.76x over the FFTW library.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: ACM SIGPLAN Notices

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.