Abstract

Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types of computing devices, for instance CPUs and GPUs, providing a huge computational potential. Programming them in a scalable way exploiting the maximal performance introduces numerous challenges such as optimizations for different computing devices, dealing with multiple levels of parallelism, the application of different programming models, work distribution, and hiding of communication with computation. We utilize the lattice Boltzmann method for fluid flow as a representative of a scientific computing application and develop a holistic implementation for large-scale CPU/GPU heterogeneous clusters. We review and combine a set of best practices and techniques ranging from optimizations for the particular computing devices to the orchestration of tens of thousands of CPU cores and thousands of GPUs. Eventually, we come up with an implementation using all the available computational resources for the lattice Boltzmann method operators. Our approach shows excellent scalability behavior making it future-proof for heterogeneous clusters of the upcoming architectures on the exaFLOPS scale. Parallel efficiencies of more than 90 % are achieved leading to 2604.72 GLUPS utilizing 24,576 CPU cores and 2048 GPUs of the CPU/GPU heterogeneous cluster Piz Daint and computing more than 6.8 × 10 9 lattice cells.

Highlights

  • Computational fluid dynamics (CFD) allows for the virtual exploration and investigation of fluid flow avoiding costly or impossible lab experiments

  • Since we focus on optimization and parallelization on large-scale CPU/GPU heterogeneous clusters, we omit a discussion on operators for different boundary conditions and refer to [37]

  • We present a holistic scalable implementation approach of the lattice Boltzmann method (LBM) for CPU/GPU

Read more

Summary

Introduction

Computational fluid dynamics (CFD) allows for the virtual exploration and investigation of fluid flow avoiding costly or impossible lab experiments. Due to its algorithmic simplicity and the spatial locality of its stencil-like advection and diffusion operators, LBM can be parallelized in an efficient way This makes the LBM especially suitable for large-scale high performance computing (HPC) systems providing reasonable computational performance for CFD. Multiple CPUs and GPUs, potentially located in different nodes of the cluster, have to be orchestrated to cooperatively perform the computational tasks Facing this complexity, programming all these levels of parallelism in a scalable way is a non-trivial task. There are already numerous successful attempts to implement the LBM on single multi- [4] and many-core processors [5,6,7,8,9,10], on small- [11] and large-scale clusters [12,13], and on heterogeneous systems consisting of single [14,15] and multiple nodes [16,17,18]. The performance of the particular LBM kernels, the advantage of a heterogeneous over a homogeneous approach, and the scalability of our implementation are evaluated in Section 5, followed by Section 6 giving a brief conclusion and summary

Related Work
The Lattice Boltzmann Method
Discretization Schemes
Collision and Propagation
Implementation of the Lattice Boltzmann Method
Optimization and Parallelization on Computing Device Level
Memory Layout Pattern
Lattice Boltzmann Method Kernels for the CPU
Lattice Boltzmann Method Kernels for the GPU
Domain Decomposition and Resource Assignment
Communication Scheme
Results
Performance of the CPU and GPU Kernels
Single Subdomain Results on Heterogeneous Systems
Large-Scale Results on Heterogeneous Systems
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.