Abstract

SIMD engines are widely present in market processors aiming to improve performance of applications through Data Level Parallelism (DLP) exploitation. However, most SIMD engines rely on specific libraries and compilers to support DLP execution, which limits DLP gains since they are restricted to analyze static code. Dynamic SIMD Assembler (DSA) [8] is capable of exploiting DLP at runtime by identifying vectorizable loops to generate ARM NEON SIMD instructions. However, its DLP coverage capability is not fully exploited, since portion of code that depends on runtime information, such as dynamic range and conditional code loops are not exploited. In this work, we extend the DSA coverage by coupling the exploitation of conditional code and dynamic range loop vectorization. Results show that the proposed techniques improve the original DSA performance in 38% considering benchmarks with opportunities to exploit conditional code and dynamic range loops. In addition, the Extended DSA, besides keeping software productivity and binary compatibility, outperforms ARM compiler auto-vectorization by 12%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call