A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters

Christoph Riesinger,Martin Schreiber,Philipp Neumann,Hans-Joachim Bungartz,Arash Bakhtiari

doi:10.3390/computation5040048

Christoph Riesinger, Martin Schreiber + Show 3 more

Open Access

https://doi.org/10.3390/computation5040048

Copy DOI

Abstract

Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types of computing devices, for instance CPUs and GPUs, providing a huge computational potential. Programming them in a scalable way exploiting the maximal performance introduces numerous challenges such as optimizations for different computing devices, dealing with multiple levels of parallelism, the application of different programming models, work distribution, and hiding of communication with computation. We utilize the lattice Boltzmann method for fluid flow as a representative of a scientific computing application and develop a holistic implementation for large-scale CPU/GPU heterogeneous clusters. We review and combine a set of best practices and techniques ranging from optimizations for the particular computing devices to the orchestration of tens of thousands of CPU cores and thousands of GPUs. Eventually, we come up with an implementation using all the available computational resources for the lattice Boltzmann method operators. Our approach shows excellent scalability behavior making it future-proof for heterogeneous clusters of the upcoming architectures on the exaFLOPS scale. Parallel efficiencies of more than 90 % are achieved leading to 2604.72 GLUPS utilizing 24,576 CPU cores and 2048 GPUs of the CPU/GPU heterogeneous cluster Piz Daint and computing more than 6.8 × 10 9 lattice cells.

Highlights

Computational fluid dynamics (CFD) allows for the virtual exploration and investigation of fluid flow avoiding costly or impossible lab experiments
Since we focus on optimization and parallelization on large-scale CPU/GPU heterogeneous clusters, we omit a discussion on operators for different boundary conditions and refer to [37]
We present a holistic scalable implementation approach of the lattice Boltzmann method (LBM) for CPU/GPU

Summary

Introduction

Computational fluid dynamics (CFD) allows for the virtual exploration and investigation of fluid flow avoiding costly or impossible lab experiments. Due to its algorithmic simplicity and the spatial locality of its stencil-like advection and diffusion operators, LBM can be parallelized in an efficient way This makes the LBM especially suitable for large-scale high performance computing (HPC) systems providing reasonable computational performance for CFD. Multiple CPUs and GPUs, potentially located in different nodes of the cluster, have to be orchestrated to cooperatively perform the computational tasks Facing this complexity, programming all these levels of parallelism in a scalable way is a non-trivial task. There are already numerous successful attempts to implement the LBM on single multi- [4] and many-core processors [5,6,7,8,9,10], on small- [11] and large-scale clusters [12,13], and on heterogeneous systems consisting of single [14,15] and multiple nodes [16,17,18]. The performance of the particular LBM kernels, the advantage of a heterogeneous over a homogeneous approach, and the scalability of our implementation are evaluated in Section 5, followed by Section 6 giving a brief conclusion and summary

Related Work

The Lattice Boltzmann Method

Discretization Schemes

Collision and Propagation

Implementation of the Lattice Boltzmann Method

Optimization and Parallelization on Computing Device Level

Memory Layout Pattern

Lattice Boltzmann Method Kernels for the CPU

Lattice Boltzmann Method Kernels for the GPU

Domain Decomposition and Resource Assignment

Communication Scheme

Results

Performance of the CPU and GPU Kernels

Single Subdomain Results on Heterogeneous Systems

Large-Scale Results on Heterogeneous Systems

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computation	Publication Date: Nov 30, 2017
Citations: 26	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computation

Lead the way for us

Similar Papers

An improved LBM approach for heterogeneous GPU-CPU clusters
Cao Wei ... Li Zongzhe
-
Cao Wei, et. al.Cao Wei ... Li Zongzhe
01 Oct 2011
01 Oct 2011

Lattice Boltzmann methods for single-phase and solid-liquid phase-change heat transfer in porous media: A review
Ya-Ling He ... Wen-Quan Tao
International Journal of Heat and Mass Transfer | VOL. 129
Ya-Ling He, et. al.Ya-Ling He ... Wen-Quan Tao
27 Sep 2018
International Journal of Heat and Mass Transfer | VOL. 129

Best practice fusion of CMMI-DEV v1.2 (PP, PMC, SAM) and PMBOK 2008
Christiane Gresse Von Wangenheim ... Rafael Prikladnicki
Information and Software Technology | VOL. 52
Christiane Gresse Von Wangenheim, et. al.Christiane Gresse Von Wangenheim ... Rafael Prikladnicki
29 Mar 2010
Best practice fusion of CMMI-DEV v1.2 (PP, PMC, SAM) and PMBOK 2008
Christiane Gresse Von Wangenheim ... Rafael Prikladnicki

AlzPED: Optimizing the predictive power of drug efficacy studies in Alzheimer’s disease animal models
Shreaya Chakroborty ... Zane Martin
Alzheimer's & Dementia | VOL. 17
Shreaya Chakroborty, et. al.Shreaya Chakroborty ... Zane Martin
01 Dec 2021
Alzheimer's & Dementia | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computation