Abstract

A parallel finite-volume algorithm based on a cell-centered high-order polynomial scheme for unstructured hybrid meshes is under consideration. The work is focused on the adaptation and optimization of basic operations of the algorithm to different architec- tures of massively-parallel accelerators including GPU of AMD and NVIDIA. Such an algorithm is especially problematic for the GPU architectures since it has very low FLOP per byte ratio meaning that performance is dominated by the memory bandwidth but not the computing performance of a device. At the same time it has irregular memory access pattern since unstructured meshes are used. The calculation of polynomial coefficients and the calculation of convective fluxes through faces of cells are the most interesting and time consuming operations of the algorithm. Implementations of these operations for accelerators using OpenCL are considered here in detail. The ways to improve the computational efficiency are proposed, performance measurement results reaching up to 160 GFLOPS on a single GPU device are demonstrated.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.