Towards Optimal Performance for Lattice Boltzmann Applications on Terascale Computers

G Wellein,P Lammers,G Hager,S Donath,T Zeiser

doi:10.1016/b978-044452206-1/50005-7

Abstract

The chapter discusses the lattice Boltzmann method (LBM) performance on commodity “off-the-shelf” clusters with Intel Xeon processors, tailored HPC systems, and a NEC SX8 vector system. The chapter describes the main architectural differences and comments on single processor performance as well as optimization strategies. The parallel performance of a large scale simulation running on up to 2000 processors, providing 2 TFlop/s of sustained performance is evaluated and presented in the chapter. In the past decade, the LBM has been established as an alternative for the numerical simulation of incompressible flows. One major reason for the success of LBM is the simplicity of its core algorithm that allows both easy adaption to complex application scenarios as well as extension to additional physical or chemical effects. Because LBM is a direct method, the use of extensive computer resources is often mandatory. Thus, LBM has attracted a lot of attention in the high-performance computing (HFC) community. An important feature of many LBM codes is that the core algorithm can be reduced to a few manageable subroutines, facilitating deep performance analysis followed by precise code and data layout optimization.

Full Text