Abstract

Graphic processor units (GPUs) are highly scalable parallel platforms for computation. A GPU contains thousands of cores along with different types of memory spaces having varying bandwidths. The maximum throughput of GPU computation lies in efficient use of these memory types. This paper presents research involving 12 different kernels to solve the standard Laplace equation in three dimensions. Each kernel uses a unique memory access pattern. The benchmarks have been established for the said problem and a novel efficient kernel is suggested after in-depth analysis. A throughput of more than 50 Giga floating point operations per seconds (GFLOPS) has been obtained on an average GPU as consequence of optimizing the memory access path. The best approach achieves a speedup of about 70 on the GPU in comparison to a CPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call