Abstract

This paper presents a powerful processing technique for fast and energy-efficient image filtering algorithm focusing energy and time-sensitive embedded and robotic platforms. Digital video processing is getting more and more popular in battery-powered devices like mobile robots and smartphones whereas in most cases, it leads overhead on the main central processing unit (CPU) and it consumes a significant amount of energy from the battery. It is suitable for parallelism since there is no data dependency between the steps of the two-dimensional convolution algorithm. We propose a vector version of the two-dimensional convolution algorithm, which can run parallel on embedded processors that has general purpose graphic processing unit (GPGPU), to reduce computation time and energy consumption. Our in-depth experiments shows that using GPGPU could reduce the execution time while guaranteeing lower power consumption and offloading the system CPU. Experimental results showed that we achieved up to 105 times faster operation and 100 times less energy consumption compared to the CPU implementation. Besides, we reduced the CPU overhead up to 10 times.

Highlights

  • Digital image processing, augmented reality and computer vision are quickly progressing research fields in collaboration with plenty amount of applications in both academia and industry

  • We did comprehensive experiments to find out the speed and energy consumption of conventional central processing unit (CPU) algorithm and our proposed Graphics Processing Units (GPUs) algorithm

  • The algorithm has been applied to the image 50 times in both CPU and GPU experiments

Read more

Summary

Introduction

Digital image processing, augmented reality and computer vision are quickly progressing research fields in collaboration with plenty amount of applications in both academia and industry. To enhance instruction per second value of the single-core processors, processor designers chose to increase the clock frequency until physical drawbacks appeared such as overheat, instability and power inefficiency [3]. These drawbacks lead the industry to the development of Single Instruction Multiple Data (SIMD) machines and multicore processors [4]. Today, embedded mobile processors have many-core Graphics Processing Units (GPUs) that have high data throughput. Since there is no data dependency among the possible different convolutions, the operation can be parallelized onto small processors like GPU cores. Where g(x, y) is the output image, f(x, y) is the input image and w is the filter kernel

Architecture
Scalar Algorithm
Vectoral Algorithm
Experimental Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call