
In the paper we provide a comparison of several runtimes which can be used for offloading computationally intensive kernels to the Intel Xeon Phi coprocessors. The presented benchmark application is a stripped-down version of an iterative solver used within the Schur complement finite or boundary element tearing and interconnecting (FETI, BETI) domain decomposition methods where the sparse solve with local stiffness matrices is replaced by the multiplication with dense matrices in order to exploit coalesced memory access patterns. We present offload approaches based on the Intel Language Extension for Offload (LEO), Hetero Streams Library (hStreams), and Heterogeneous Active Messages (HAM), and compare their performance and ease of use.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call