Abstract

The OpenCL framework supports SIMD capabilities available in general purpose processors, which have been used to prospect performance improvements in several applications. In this paper we propose efficient algorithms for linear image processing by exploring the provided SIMD extensions on AMD and Intel processors. The efficiency of the SIMD based computation inferred by the OpenCL compiler is also experimentally evaluated. Starting from a reference algorithm and implementation, several optimizations are proposed that lead to increasingly higher performance figures. Experimental results suggest an average 4-fold performance improvement when the vectorization of the operations is tuned. Furthermore, more than 10 times speedup is suggested by applying efficient data organization. The experimental work and achieved results also suggest that the SIMD based OpenCL implementations provide an average of 1.8 times lower performance than equivalent implementations that directly employ the SIMD intrinsics supported by the Intel Compiler. Moreover, it is shown that real time image processing is achieved when SIMD instructions are used.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.