Abstract

Fast Fourier Transform (FFT) is a fundamental operation for 2D data in various applications. To accelerate large-scale 2D-FFT computation, we propose a Heterogeneous parallel In-place 2D-FFT algorithm, HI-FFT. Our novel work decomposition method makes it possible to run our parallel algorithm on the original data (i.e., in-place), unlike prior parallel algorithms that require additional memory space (i.e., out-of-place) to guarantee independence among sub-tasks. Our work decomposition method also removes the duplicated operations on the out-of-place approaches. Using our decomposition method, we introduced an in-place heterogeneous parallel algorithm that utilizes both multi-core CPU and GPU simultaneously. To maximize the utilization efficiency of the computing resources, we also propose a priority-based dynamic scheduling method.We compared the performance of seven different 2D-FFT algorithms, including ours, for large-scale 2D-FFT problems whose sizes varied from 20K2 to 120K2. As a result, we found that our method achieved up to 2.92 and 4.42 times higher performance than the conventional homogeneous parallel algorithms based on the state-of-the-art CPU and GPU libraries, respectively. Also, our method showed up to 2.27 times higher performance than the prior heterogeneous algorithms while requiring two times less memory space. To check the benefit of our HI-FFT on an actual application, we applied it to a CGH (Computer Generated Holography) process. We found that it successfully reduces the hologram generation time. These results demonstrate the advantage of our approach for large-scale 2D-FFT computation.

Highlights

  • The Discrete Fourier Transform (DFT) is one of the fundamental operations in the scientific and engineering domains [1]

  • We implemented our HI-Fast Fourier Transform (FFT) algorithm on three different heterogeneous machines consisting of multi-core CPU(s) and a GPU (Table 1)

  • We found that our scheduling algorithm, which dynamically controls the workload for available computing resources, allowed HI-FFT to achieve stable and high performance regardless of system configuration

Read more

Summary

INTRODUCTION

The Discrete Fourier Transform (DFT) is one of the fundamental operations in the scientific and engineering domains [1]. Hybrid CPU/GPU algorithms have been proposed, and they solved the limited GPU memory issue by maintaining the all of data in the CPU memory and sending parts of them to the GPU memory [18] They utilized both multi-core CPU and GPU for the computation, to obtain further performance improvements [19], [20]. A line-based work distribution requires a copy of the matrix to guarantee independence for parallel processing, and this leads to redundant operations To solve these memory and computational overheads of the line-based approach, we propose a novel work decomposition method that divides the 2DFFT computation work into sub-tasks whose workspaces are disjoint (Sec. III). The multi-core CPUs-only version of our method (i.e., HI-FFTCP U ) achieved better performance than the conventional CPU parallel algorithms using a line-based work decomposition approach. These results demonstrate the advantages of our approach in terms of both memory usage and computational performance

RELATED WORK
GENERAL PROCESS OF 2D-FFT COMPUTATION
HI-FFT FRAMEWORK
TASK GENERATION ALGORITHM
PRIORITY-BASED DYNAMIC SCHEDULING
Run all slave workers in parallel
RESULTS AND ANALYSIS
PERFORMANCE ANALYSIS
CONCLUSIONS AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.