Abstract

Accelerators are becoming a key component to improve efficiency in High-Performance Computing systems (HPC). While GPU based systems are widely used to accelerate HPC workloads, new systems based on long-vector architectures are rapidly gaining popularity. The development of optimized math libraries becomes fundamental to achieve high performance in those emerging vector architectures. This paper focuses on the optimization of the HPCG benchmark, which comprises four fundamental kernels found in many numerical applications. We target two relevant long-vector architectures like the NEC Vector Engine and the RISC-V ’V’ vector extension. Compared to the well-tuned proprietary solution, our open HPCG implementation achieves a 1.6% improvement in performance on the NEC Vector Engine and achieves near maximum memory bandwidth utilization in the two evaluated RISC-V vector accelerator designs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call