A cache‐efficient implementation of the lattice Boltzmann method for the two‐dimensional diffusion equation

A C Velivelli,K M Bryden

doi:10.1002/cpe.868

Abstract

AbstractThe lattice Boltzmann method is an important technique for the numerical solution of partial differential equations because it has nearly ideal scalability on parallel computers for many applications. However, to achieve the scalability and speed potential of the lattice Boltzmann technique, the issues of data reusability in cache‐based computer architectures must be addressed. Utilizing the two‐dimensional diffusion equation, $T_t=\mu(T_{xx}T_{yy})$, this paper examines cache optimization for the lattice Boltzmann method in both serial and parallel implementations. In this study, speedups due to cache optimization were found to be 1.9–2.5 for the serial implementation and 3.6–3.8 for the parallel case in which the domain decomposition was optimized for stride‐one access. In the parallel non‐cached implementation, the method of domain decomposition (horizontal or vertical) used for parallelization did not significantly affect the compute time. In contrast, the cache‐based implementation of the lattice Boltzmann method was significantly faster when the domain decomposition was optimized for stride‐one access. Additionally, the cache‐optimized lattice Boltzmann method in which the domain decomposition was optimized for stride‐one access displayed superlinear scalability on all problem sizes as the number of processors was increased. Copyright © 2004 John Wiley & Sons, Ltd.

Full Text