Development of element-by-element kernel algorithms in unstructured finite-element solvers for many-core wide-SIMD CPUs: Application to earthquake simulation

Kohei Fujita,Masashi Horikoshi,Tsuyoshi Ichimura,Larry Meadows,Kengo Nakajima,Muneo Hori,Lalith Maddegedara

doi:10.1016/j.jocs.2020.101174

Abstract

Acceleration of the element-by-element (EBE) kernel in matrix-vector products is essential for high-performance in unstructured implicit finite-element applications. However, the EBE kernel is not straightforward to attain high performance due to random data access with data recurrence. In this paper, we develop methods to circumvent these data races for high performance on many-core CPU architectures with wide SIMD units. The developed EBE kernel attains 16.3% and 16.0% of FP32 peak on Intel Xeon Phi (Knights Landing) based Oakforest-PACS and Intel Xeon Platinum (Cascade Lake) based Oakbridge-CX, respectively. This leads to 2.88-fold speedup over the baseline kernel and 2.03-fold speedup of the whole finite-element application on Oakforest-PACS. Examples of finite-element earthquake simulations using the developed EBE kernel algorithms are shown. These insights are expected to enable high performance on other unstructured finite-element solvers on large-scale many-core wide-SIMD CPU based systems.

Full Text