Abstract

Jacobi iteration based on finite difference and finite element discrete scheme is a kind of typical stencil computation in scientific computing. In this paper, we analyze the parallel optimization of Jacobi iteration in the real CFD codes on the Intel Many Integrated Core architecture, and get high performance. We use loop fusion, data structure transformation, subroutine and loop unrolling, cache blocking and some other optimization techniques in our implementation. We also collect hardware performance indicators through the open source performance analysis tools, in order to guide and verify the performance optimization on the many-core architectures. Experimental results on Intel Xeon Phi working in the native execution mode show that our Jacobi iteration can achieve 83.47% parallel efficiency and 4.73 speed ratio of vectorization with a 128 × 128 × 256 grid.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call