Abstract

Convolution filtering is one of the most important algorithms in image processing. It is data-intensive, especially when dealing with high-definition images. Most previous studies on accelerating convolution computation in parallel focus on the use of graphics processing units (GPUs), whereas the central processing units (CPUs) always play the role of host to manage the data buffer and control flow. However, recent CPU architectures have seen significant modifications to parallel data computing capabilities, and the trend of integrating the CPU and GPU on a single chip is on a rise. We propose an approach to accelerate convolution filtering on the heterogeneous architecture of integrated CPU–GPU. We exploit the parallel processing power of vector instructions on a CPU and make it collaboratively function with the on-chip GPU. Two task assignment methods, static and dynamic task partitioning, are proposed for CPU–GPU collaboration. We evaluate our approach with images and filters of different sizes. The experimental results demonstrate that we can achieve 146 GFLOP/s at best using a quad-core CPU and the performance is 2.5 to 4.8 times faster than that of the single-GPU version of the OpenCV library. We also obtain 90 times speedup over the single-threaded CPU version. The results demonstrate that the proposed algorithm is efficient.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.