Intel Cilk Plus for complex parallel algorithms: “Enormous Fast Fourier Transforms” (EFFT) library

Ryo Asai,Andrey Vladimirov

doi:10.1016/j.parco.2015.05.004

Abstract

In this paper we demonstrate the methodology for parallelizing the computation of large one-dimensional discrete fast Fourier transforms (DFFTs) on multi-core Intel Xeon processors. DFFTs based on the recursive Cooley–Tukey method have to control cache utilization, memory bandwidth and vector hardware usage, and at the same time scale across multiple threads or compute nodes. Our method builds on a single-threaded Intel Math Kernel Library (MKL) implementation of real-to-complex DFFT, and uses the Intel Cilk Plus framework for thread parallelism. We demonstrate the ability of Intel Cilk Plus to handle parallel recursion with nested loop-centric parallelism without tuning the code to the number of cores or cache metrics. The result of our work is a library called EFFT that performs 1D DFTs of size 2N for N ≥ 21 faster than the corresponding Intel MKL parallel DFT implementation by up to 1.5 × , and faster than FFTW by up to 2.5x. The code of EFFT is available for free download under the GPLv3 license. This work provides a new efficient DFFT implementation, and at the same time demonstrates an educational example of how computer science problems with complex parallel patterns can be optimized for high performance using the Intel Cilk Plus framework.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Intel Cilk Plus for complex parallel algorithms: “Enormous Fast Fourier Transforms” (EFFT) library

Abstract

Talk to us

Similar Papers

More From: Parallel Computing

Lead the way for us

Journal: Parallel Computing	Publication Date: Oct 1, 2015
Citations: 5

Similar Papers

Multi-Core Program Optimization: Parallel Sorting Algorithms in Intel Cilk Plus
Sabahat Saleem ... Abou Bakar Nauman
International Journal of Hybrid Information Technology | VOL. 7
Sabahat Saleem, et. al.Sabahat Saleem ... Abou Bakar Nauman
31 Mar 2014
International Journal of Hybrid Information Technology | VOL. 7

The Exploration of Pervasive and Fine-Grained Parallel Model Applied on Intel Xeon Phi Coprocessor
Christophe Calvin ... Fan Ye
-
Christophe Calvin, et. al.Christophe Calvin ... Fan Ye
01 Oct 2013
01 Oct 2013

Memory-mapping support for reducer hyperobjects
I-Ting Angelina Lee ... Charles E Leiserson
-
I-Ting Angelina Lee, et. al.I-Ting Angelina Lee ... Charles E Leiserson
25 Jun 2012
25 Jun 2012

High Performance Optimization of Independent Component Analysis Algorithm for EEG Data
Anna Gajos-Balińska ... Przemysław Stpiczyński
-
Anna Gajos-Balińska, et. al.Anna Gajos-Balińska ... Przemysław Stpiczyński
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intel Cilk Plus for complex parallel algorithms: “Enormous Fast Fourier Transforms” (EFFT) library

Abstract

Talk to us

Similar Papers

More From: Parallel Computing