Scalable Parallel Algorithm of Multiple-Relaxation-Time Lattice Boltzmann Method with Large Eddy Simulation on Multi-GPUs

Lei Xu,Wu Zhang,Anping Song

doi:10.1155/2018/1298313

Abstract

The lattice Boltzmann method (LBM) has become an attractive and promising approach in computational fluid dynamics (CFD). In this paper, parallel algorithm of D3Q19 multi-relaxation-time LBM with large eddy simulation (LES) is presented to simulate 3D flow past a sphere using multi-GPUs (graphic processing units). In order to deal with complex boundary, the judgement method of boundary lattice for complex boundary is devised. The 3D domain decomposition method is applied to improve the scalability for cluster, and the overlapping mode is introduced to hide the communication time by dividing the subdomain into two parts: inner part and outer part. Numerical results show good agreement with literature and the 12 Kepler K20M GPUs perform about 5100 million lattice updates per second, which indicates considerable scalability.

Highlights

Driven by the market demand for real-time, high-definition 3D graphics at processing large graphics data sets, graphics processing unit (GPU) has been developed for rending tasks
Wu and Shao [19] simulated the lid-driven cavity flow using MRT-lattice Boltzmann method (LBM) compared with single-relaxation-time LBM (SRT-LBM) by parallel implementation
As a result of the dimension of the problems treated with the LBM, a single piece of GPU cannot deal with the problems and high computing power and large memory space are required

Summary

Introduction

Driven by the market demand for real-time, high-definition 3D graphics at processing large graphics data sets, graphics processing unit (GPU) has been developed for rending tasks. There are several variations of LBM including lattice Bhatnagar-Gross-Krook (LBGK) [6] or single-relaxation-time LBM (SRT-LBM) [7], entropic LBM (ELBM) [8], two-relaxationtime LBM (TRT-LBM) [9], and multiple-relaxation-time LBM (MRT-LBM) [10, 11] In these methods, MRT-LBM can improve stability and give accurate results in solving higher Reynolds number flow simulations [12]. The parallel algorithm of MRT-LBM-LES on multi-GPUs is studied. Wu and Shao [19] simulated the lid-driven cavity flow using MRT-LBM compared with SRT-LBM by parallel implementation. Tran et al developed high performance parallelization of LBM on a GPU by reducing the overheads associated with the uncoalesced memory accesses and improving the cache locality using the tiling optimization with the data layout change [28].

MRT-LBM with LES

Multi-GPUs Architecture

CPU processor 3

MRT-LBM with LES on Multi-GPUs

Numerical Results and Discussion

Conclusion