Abstract

The paper describes a multilevel MPI+OpenMP+OpenCL parallelization approach that provides complete portability across a wide range of hybrid supercomputer architectures. A parallel CFD algorithm for heterogeneous computing of turbulent flows is presented. It simulates the compressible Navier–Stokes equations using a cell-centered finite-volume method with polynomial reconstruction on unstructured hybrid meshes. A two-level partitioning is used for the workload distribution among computing devices of hybrid nodes. The overlap of communications and computations hides the data transfer expenses. The scalability is tested on various HPC systems including a fat node with 8 GPUs and supercomputers using up to 320 GPUs. Comparison of performance is presented for multicore CPUs, Intel Xeon Phi, various GPUs of AMD and NVIDIA. The heterogeneous execution using CPUs and GPUs is studied in detail.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call