Exploiting SIMD extensions for linear image processing with OpenCL

Samuel Antao,Leonel Sousa

doi:10.1109/iccd.2010.5647672

Abstract

The OpenCL framework supports SIMD capabilities available in general purpose processors, which have been used to prospect performance improvements in several applications. In this paper we propose efficient algorithms for linear image processing by exploring the provided SIMD extensions on AMD and Intel processors. The efficiency of the SIMD based computation inferred by the OpenCL compiler is also experimentally evaluated. Starting from a reference algorithm and implementation, several optimizations are proposed that lead to increasingly higher performance figures. Experimental results suggest an average 4-fold performance improvement when the vectorization of the operations is tuned. Furthermore, more than 10 times speedup is suggested by applying efficient data organization. The experimental work and achieved results also suggest that the SIMD based OpenCL implementations provide an average of 1.8 times lower performance than equivalent implementations that directly employ the SIMD intrinsics supported by the Intel Compiler. Moreover, it is shown that real time image processing is achieved when SIMD instructions are used.

Full Text