Abstract

Partial Differential Equations (PDEs) are widely used to simulate many scenarios in science and engineering, usually solved through iterative techniques (e.g., Jacobi, Gauss–Seidel). These methods produce an approximate solution to the problem based on Stencil patterns of computation. The complexity, granularity and dimensionality of the problem require of substantial computational resources that are not affordable by regular CPU-based architectures.Emerging massively data-parallel architectures, such as Intel Xeon Phi, offer a great opportunity to address challenging problems based on PDEs. However, the code migration to these architectures is not straight-forward. To achieve this code modernization programming cycle, it is mandatory to identify the key issues in the code that will determine performance in future hardware evolutions. In this paper we look for (1) scalability with core count, (2) data-parallelism exposure to explore vectorization capabilities, and (3) data-locality aware techniques. These techniques lead a performance gain of up to 15x for the first generation of Xeon Phi: Knights Corner (KNC), and an additional average 2.5x improvement for Knights Landing (KNL).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call