Abstract

Kernel superposition, where an image is convolved with a spatially varying kernel, is commonly used in optics, astronomy, medical imaging and radiotherapy. This operation is computationally expensive and generally cannot benefit from the mathematical simplifications available for true convolutions. We systematically evaluated the performance of a number of implementations of a 2D Gaussian kernel superposition on several graphics processing units of two recent architectures. The 2D Gaussian kernel was used because of its importance in real-life applications and representativeness of expensive-to-evaluate, separable kernels. The implementations were based both on the gather approach found in the literature and on the scatter approach presented here. Our results show that, over a range of kernel sizes, the scatter approach delivers speedups of 2.1–14.5 or 1.3–4.9 times, depending on the architecture. These numbers were further improved to 4.8–28.5 and 3.7–16.8 times, respectively, when only “exact” implementations were compared. Speedups similar to those presented are expected for other separable kernels and, we argue, will also remain applicable for problems of higher dimensionality.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.