Abstract

Increasing computational requirements in embedded mobile devices, especially with regard to image processing, necessitates the use of hardware acceleration to handle higher throughputs. ARM Cortex A series of processors, which are present in a majority of mobile devices such as smartphones, feature a single instruction multiple data (SIMD) coprocessor called NEON. The wide SIMD architecture enables the parallelization of various image processing operations and algorithms, resulting in significant improvements in the system throughput. This paper discusses the use of NEON Intrinsics, a C language implementation of the NEON SIMD instruction set, which is an extension of the ARM instruction set, allowing the seamless integration of optimized image processing libraries with application code written in higher level languages, while maintaining performance comparable to hand optimized assembly language libraries. Several image processing operations and algorithms optimized using NEON Intrinsics are benchmarked on a test bench as well as real world hardware platforms to validate the acceleration achieved in image processing tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call