Vectorization cost modeling for NEON, AVX and SVE

Angela Pohl,Biagio Cosenza,Ben Juurlink

doi:10.1016/j.peva.2020.102106

Abstract

Compiler optimization passes employ cost models to determine if a code transformation will yield performance improvements. When this assessment is inaccurate, compilers apply transformations that are not beneficial, or refrain from applying ones that would have improved the code. We analyze the accuracy of the cost models used in LLVM’s and GCC’s vectorization passes for three different instruction set architectures, including both traditional SIMD architectures with a defined fixed vector register size (AVX2 and NEON), and novel instruction set with scalable vector size (SVE). In general, speedup is over-estimated, resulting in mispredictions and a weak to medium correlation between predicted and actual performance gain. We therefore propose a novel cost model that is based on a code’s intermediate representation with refined memory access pattern features. Using linear regression techniques, this platform independent model is fitted to an AVX2 and a NEON hardware, as well as an SVE simulator. Results show that the fitted model significantly improves the correlation between predicted and measured speedup (AVX2: +52% for training data, +13% for validation data), reduces the average error of the speedup prediction (SVE: -43% for training data, -36% for validation data), as well as the number of mispredictions (NEON: -88% for training data, -71% for validation data) for more than 80 code patterns.

Full Text