Abstract

The paper describes a multilevel MPI+OpenMP+OpenCL parallelization approach that provides complete portability across a wide range of hybrid supercomputer architectures. A parallel CFD algorithm for heterogeneous computing of turbulent flows is presented. It simulates the compressible Navier–Stokes equations using a cell-centered finite-volume method with polynomial reconstruction on unstructured hybrid meshes. A two-level partitioning is used for the workload distribution among computing devices of hybrid nodes. The overlap of communications and computations hides the data transfer expenses. The scalability is tested on various HPC systems including a fat node with 8 GPUs and supercomputers using up to 320 GPUs. Comparison of performance is presented for multicore CPUs, Intel Xeon Phi, various GPUs of AMD and NVIDIA. The heterogeneous execution using CPUs and GPUs is studied in detail.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.