Abstract

A heterogeneous parallel algorithm for simulation of compressible turbulent flows and its portable software implementation are presented. The underlying numerical method is based on a family of higher accuracy edge-based reconstruction schemes on unstructured mixed-element meshes. The proposed parallel solution can engage a large number of computing devices of most of the existing computing architectures used in modern supercomputers, including manycore CPUs and GPUs. It is capable of co-execution on both CPUs and accelerators simultaneously. The multilevel parallel algorithm combines: MPI for distributing workload among hybrid cluster nodes and between devices inside nodes; OpenMP for manycore CPUs and other supporting devices, such as Intel Xeon Phi; OpenCL for massively-parallel accelerators, such as GPUs of various vendors, including NVIDIA, AMD, Intel. The main focus is on the adaptation of the numerical method and its computational algorithm to the stream processing parallel paradigm. The very limited device memory inherent in GPU computing is also taken into account. A detailed description of the parallel algorithm is presented, as well as the techniques used for its efficient parallel implementation. Special attention is paid to implicit time integration with its linear solver and calculation of convective fluxes and viscous terms. The use of mixed floating-point precision and overlapping communications and computations is also discussed. Parallel performance is demonstrated in practical applications on different kinds of supercomputers using up to 10 thousand cores and multiple GPUs of comparable overall performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call