Abstract
In this work, we present an exhaustive performance analysis of the integration of boundary conditions in a unified CUDA kernel for a lattice Boltzmann shallow water solver. This kernel is implemented under the pull scheme approach of the lattice Boltzmann method. The analysis is performed simulating open ocean domains with open and bounce-back boundary conditions. Boundary conditions treatment is divided in two steps: identification of the classes of the distribution function components in a node and branching handling. Several methods are proposed for each step, and all the combinations of them are tested with different hardware, domain size and floating point precision. Results show that high performance is achieved when using two binary precomputed values for class identification, while handling branchings with Boolean multiplication should be avoided. A full report of the MLUPS (Millions of Lattice Updates Per Second) ratio achieved with each test is presented.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have