Exploiting GPU memory hierarchy for accelerating a specialized stencil computation

Thanasekhar Balaiah,Ranjani Parthasarathi

doi:10.1002/cpe.4267

Abstract

SummaryStencil computations are an important class of problems that can benefit from graphics processing units (GPUs). However, given the hierarchical and on‐chip blocked memory organization in GPUs, the memory performance degrades for specific data access patterns in stencils. Hence, we need appropriate data layout to effectively use the different levels of the memory to harvest the full potential of GPUs. In this context, a specialized stencil computation problem, namely, Lattice Boltzmann Method, which has a complex neighborhood relationship along with loop carried dependence, is considered as a strong case study. Four different approaches for the lattice Boltzmann method have been developed in this work by exploiting memory hierarchy with new data layouts and kernel organizations. These methods have been developed with the primary aim of increasing the compute to global memory access ratio and reducing the overall read‐write latency, even at the expense of additional computations. NVIDIA GPUs TitanX, GTX 960, GTX 740Ti, and GTX 650Ti have been used to test the proposed techniques. The compute to global memory access ratio shows an improvement of 2 to 10 times over the naive solutions in this work. The performance, in terms of time taken per iteration, is improved by up to 3.7 times. The million lattice units per second for both 2DQ9 and 3DQ19 models improve by more than 2 times.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting GPU memory hierarchy for accelerating a specialized stencil computation

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience

Lead the way for us

Journal: Concurrency and Computation: Practice and Experience	Publication Date: Aug 31, 2017
Citations: 1

Similar Papers

Patient-specific modelling of pulmonary airflow using GPU cluster for the application in medical practice
T Miki ... T Yamaguchi
Computer Methods in Biomechanics and Biomedical Engineering | VOL. 15
T Miki, et. al.T Miki ... T Yamaguchi
02 Aug 2011
Computer Methods in Biomechanics and Biomedical Engineering | VOL. 15

Implementation and Optimization of Lattice Boltzmann Method for Fluid Flow on GPU with CUDA
Zhangrong Qin ... Haiyan Liu
International Journal of Digital Content Technology and its Applications | VOL. 6
Zhangrong Qin , et. al.Zhangrong Qin ... Haiyan Liu
31 Jul 2012
International Journal of Digital Content Technology and its Applications | VOL. 6

FULL GPU Implementation of Lattice-Boltzmann Methods with Immersed Boundary Conditions for Fast Fluid Simulations
...
The International Journal of Multiphysics | VOL. 11
, et. al. ...
31 Mar 2017
The International Journal of Multiphysics | VOL. 11

Numerical simulation of a 2D electrothermal pump by lattice Boltzmann method on GPU
Qinlong Ren ... Cho Lik Chan
Numerical Heat Transfer, Part A: Applications | VOL. 69
Qinlong Ren, et. al.Qinlong Ren ... Cho Lik Chan
23 Mar 2016
Numerical Heat Transfer, Part A: Applications | VOL. 69

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting GPU memory hierarchy for accelerating a specialized stencil computation

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience