Abstract

A recent trend in modern high-performance computing environments is the introduction of accelerators such as GPU and Xeon Phi, i.e. specialized computing devices that are optimized for highly parallel applications and coexist with CPUs. In regular compute-intensive applications with predictable data access patterns, these devices often outperform traditional CPUs by far and thus relegate them to pure control functions instead of computations. For irregular applications however, the gap in relative performance can be much smaller, and sometimes even reversed. Thus, maximizing overall performance in such systems requires that full use of all available computational resources is made. In this paper we study the attainable performance of the cell-centered finite volume method on 3D unstructured tetrahedral meshes using heterogeneous systems consisting of CPUs and multiple GPUs. Finite volume methods are widely used numerical strategies for solving partial differential equations. The advantages of using finite volumes include built-in support for conservation laws and suitability for unstructured meshes. Our focus lies in demonstrating how a workload distribution that maximizes overall performance can be derived from the actual performance attained by the different computing devices in the heterogeneous environment. We also highlight the dual role of partitioning software in reordering and partitioning the input mesh, thus giving rise to a new combined approach to partitioning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call