Efficient Memory Access Patterns for Solving 3D Laplace Equation on GPU

Muhammad Hanif Durad,Anila Usman,Muhammad Naveed Akhtar,Muhammad Abid Mughal

doi:10.1007/s40995-016-0042-7

Abstract

Graphic processor units (GPUs) are highly scalable parallel platforms for computation. A GPU contains thousands of cores along with different types of memory spaces having varying bandwidths. The maximum throughput of GPU computation lies in efficient use of these memory types. This paper presents research involving 12 different kernels to solve the standard Laplace equation in three dimensions. Each kernel uses a unique memory access pattern. The benchmarks have been established for the said problem and a novel efficient kernel is suggested after in-depth analysis. A throughput of more than 50 Giga floating point operations per seconds (GFLOPS) has been obtained on an average GPU as consequence of optimizing the memory access path. The best approach achieves a speedup of about 70 on the GPU in comparison to a CPU.

Full Text