Abstract
AbstractWe propose a novel graphics processing unit (GPU) algorithm that can handle a large‐scale 3D fast Fourier transform (i.e., 3D‐FFT) problem whose data size is larger than the GPU's memory. A 1D FFT‐based 3D‐FFT computational approach is used to solve the limited device memory issue. Moreover, to reduce the communication overhead between the CPU and GPU, we propose a 3D data‐transposition method that converts the target 1D vector into a contiguous memory layout and improves data transfer efficiency. The transposed data are communicated between the host and device memories efficiently through the pinned buffer and multiple streams. We apply our method to various large‐scale benchmarks and compare its performance with the state‐of‐the‐art multicore CPU FFT library (i.e., fastest Fourier transform in the West [FFTW]) and a prior GPU‐based 3D‐FFT algorithm. Our method achieves a higher performance (up to 2.89 times) than FFTW; it yields more performance gaps as the data size increases. The performance of the prior GPU algorithm decreases considerably in massive‐scale problems, whereas our method's performance is stable.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have