A scheme for accelerating image processing algorithms using SIMD for ARM Cortex A based systems

Nikhilesh Prasannakumar

doi:10.1109/iciiecs.2017.8275887

Abstract

Increasing computational requirements in embedded mobile devices, especially with regard to image processing, necessitates the use of hardware acceleration to handle higher throughputs. ARM Cortex A series of processors, which are present in a majority of mobile devices such as smartphones, feature a single instruction multiple data (SIMD) coprocessor called NEON. The wide SIMD architecture enables the parallelization of various image processing operations and algorithms, resulting in significant improvements in the system throughput. This paper discusses the use of NEON Intrinsics, a C language implementation of the NEON SIMD instruction set, which is an extension of the ARM instruction set, allowing the seamless integration of optimized image processing libraries with application code written in higher level languages, while maintaining performance comparable to hand optimized assembly language libraries. Several image processing operations and algorithms optimized using NEON Intrinsics are benchmarked on a test bench as well as real world hardware platforms to validate the acceleration achieved in image processing tasks.

Full Text