Abstract
In a large variety of research areas, convolution products that relate a physical quantity in some observation points with their sources are encountered. When the sources and the observation points coincide, the numerical evaluation of the physical quantity typically leads to order N2 numerical problems. Here, fast Fourier transforms are widely used to reduce the computations to order Nlog N complexity. When adopting Fourier transforms FFT for finite physical problems, zero padding is required. Hence, in 2D and 3D problems, an optimization of the evaluation of the convolution product includes a non-execution of Fourier transforms on arrays containing only zeros in the forward 2D or 3D FFT scheme and their corresponding arrays in the inverse 2D or 3D FFT scheme. This paper describes the implementation of such an approach on graphical processing units GPUs and compares the time gains on GPU and on CPU. It is found that on CPU, the speedup corresponds with the theoretical limit, while in the GPU implementation, the memory bandwidth limits the speedup ratio. Copyright © 2013 John Wiley & Sons, Ltd.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Numerical Modelling: Electronic Networks, Devices and Fields
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.