Automatic Vectorization of Interleaved Data Revisited

Andrew Anderson,Avinash Malik,David Gregg

doi:10.1145/2838735

Andrew Anderson, Avinash Malik + Show 1 more

Open Access

https://doi.org/10.1145/2838735

Copy DOI

Abstract

Automatically exploiting short vector instructions sets (SSE, AVX, NEON) is a critically important task for optimizing compilers. Vector instructions typically work best on data that is contiguous in memory, and operating on non-contiguous data requires additional work to gather and scatter the data. There are several varieties of non-contiguous access, including interleaved data access. An existing approach used by GCC generates extremely efficient code for loops with power-of-2 interleaving factors (strides). In this paper we propose a generalization of this approach that produces similar code for any compile-time constant interleaving factor. In addition, we propose several novel program transformations, which were made possible by our generalized representation of the problem. Experiments show that our approach achieves significant speedups for both power-of-2 and non--power-of-2 interleaving factors. Our vectorization approach results in mean speedups over scalar code of 1.77x on Intel SSE and 2.53x on Intel AVX2 in real-world benchmarking on a selection of BLAS Level 1 routines. On the same benchmark programs, GCC 5.0 achieves mean improvements of 1.43x on Intel SSE and 1.30x on Intel AVX2. In synthetic benchmarking on Intel SSE, our maximum improvement on data movement is over 4x for gathering operations and over 6x for scattering operations versus scalar code.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Vectorization of Interleaved Data Revisited

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization

Lead the way for us

Journal: ACM Transactions on Architecture and Code Optimization	Publication Date: Dec 8, 2015
Citations: 21

Similar Papers

Lightweight Fault Attack Resistance in Software Using Intra-instruction Redundancy, Revisited
Hwajeong Seo ... Taehwan Park
-
Hwajeong Seo, et. al.Hwajeong Seo ... Taehwan Park
01 Jan 2018
01 Jan 2018

SIMD Vectorization of Non-Two-Power Sized FFTs
Franz Franchetti ... Markus Puschel
-
Franz Franchetti, et. al.Franz Franchetti ... Markus Puschel
01 Apr 2007
01 Apr 2007

Efficient Utilization of SIMD Extensions
F Franchetti ... C.W Ueberhuber
Proceedings of the IEEE | VOL. 93
F Franchetti, et. al.F Franchetti ... C.W Ueberhuber
01 Feb 2005
Proceedings of the IEEE | VOL. 93

Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations
Hayfa Tayeb ... Bérenger Bramas
ACM Transactions on Architecture and Code Optimization | VOL. 21
Hayfa Tayeb, et. al.Hayfa Tayeb ... Bérenger Bramas
15 Dec 2023
ACM Transactions on Architecture and Code Optimization | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Vectorization of Interleaved Data Revisited

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization