Stencil codes on a vector length agnostic architecture

Adrià Armejach,Helena Caminal,Juan M Cebrian,Chris Adeniyi-Jones,Mateo Valero,Marc Casas,Rekai González-Alberquilla,Miquel Moretó

doi:10.1145/3243176.3243192

Abstract

Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manual vectorization is a tedious and costly process that needs to be repeated for each specific instruction set or register size. In addition, automatic compiler vectorization is susceptible to code complexity, and usually limited due to data and control dependencies. To address some these issues, Arm recently released a new vector ISA, the Scalable Vector Extension (SVE), which is Vector-Length Agnostic (VLA). VLA enables the generation of binary files that run regardless of the physical vector register length. In this paper we leverage the main characteristics of SVE to implement and optimize stencil computations, ubiquitous in scientific computing. We show that SVE enables easy deployment of textbook optimizations like loop unrolling, loop fusion, load trading or data reuse. Our detailed simulations using vector lengths ranging from 128 to 2,048 bits show that these optimizations can lead to performance improvements over straight-forward vectorized code of up to 56.6% for 2,048 bit vectors. In addition, we show that certain optimizations can hurt performance due to a reduction in arithmetic intensity, and provide insight useful for compiler optimizers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Stencil codes on a vector length agnostic architecture

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Nov 1, 2018
Citations: 17	License type: other-oa

Similar Papers

Using Arm’s scalable vector extension on stencil codes
Adrià Armejach ... Rubén Langarita
The Journal of Supercomputing | VOL. 76
Adrià Armejach, et. al.Adrià Armejach ... Rubén Langarita
08 Apr 2019
The Journal of Supercomputing | VOL. 76

Optimizing Stencil Codes with Exploiting Data Reuse
Xu Chang ... Li Shen
-
Xu Chang, et. al.Xu Chang ... Li Shen
01 Oct 2021
01 Oct 2021

Performing SVE Studies using the Arm Instruction Emulator
Miguel Tairum Cruz
-
Miguel Tairum CruzMiguel Tairum Cruz
01 Sep 2018
01 Sep 2018

Preliminary Performance Evaluation of Application Kernels Using ARM SVE with Multiple Vector Lengths
Yuetsu Kodama ... Jinpil Lee
-
Yuetsu Kodama, et. al.Yuetsu Kodama ... Jinpil Lee
01 Sep 2017
01 Sep 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Stencil codes on a vector length agnostic architecture

Abstract

Talk to us

Similar Papers