Abstract

Acceleration of the element-by-element (EBE) kernel in matrix-vector products is essential for high-performance in unstructured implicit finite-element applications. However, the EBE kernel is not straightforward to attain high performance due to random data access with data recurrence. In this paper, we develop methods to circumvent these data races for high performance on many-core CPU architectures with wide SIMD units. The developed EBE kernel attains 16.3% and 16.0% of FP32 peak on Intel Xeon Phi (Knights Landing) based Oakforest-PACS and Intel Xeon Platinum (Cascade Lake) based Oakbridge-CX, respectively. This leads to 2.88-fold speedup over the baseline kernel and 2.03-fold speedup of the whole finite-element application on Oakforest-PACS. Examples of finite-element earthquake simulations using the developed EBE kernel algorithms are shown. These insights are expected to enable high performance on other unstructured finite-element solvers on large-scale many-core wide-SIMD CPU based systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.