Abstract
This paper presents a powerful processing technique for fast and energy-efficient image filtering algorithm focusing energy and time-sensitive embedded and robotic platforms. Digital video processing is getting more and more popular in battery-powered devices like mobile robots and smartphones whereas in most cases, it leads overhead on the main central processing unit (CPU) and it consumes a significant amount of energy from the battery. It is suitable for parallelism since there is no data dependency between the steps of the two-dimensional convolution algorithm. We propose a vector version of the two-dimensional convolution algorithm, which can run parallel on embedded processors that has general purpose graphic processing unit (GPGPU), to reduce computation time and energy consumption. Our in-depth experiments shows that using GPGPU could reduce the execution time while guaranteeing lower power consumption and offloading the system CPU. Experimental results showed that we achieved up to 105 times faster operation and 100 times less energy consumption compared to the CPU implementation. Besides, we reduced the CPU overhead up to 10 times.
Highlights
Digital image processing, augmented reality and computer vision are quickly progressing research fields in collaboration with plenty amount of applications in both academia and industry
We did comprehensive experiments to find out the speed and energy consumption of conventional central processing unit (CPU) algorithm and our proposed Graphics Processing Units (GPUs) algorithm
The algorithm has been applied to the image 50 times in both CPU and GPU experiments
Summary
Digital image processing, augmented reality and computer vision are quickly progressing research fields in collaboration with plenty amount of applications in both academia and industry. To enhance instruction per second value of the single-core processors, processor designers chose to increase the clock frequency until physical drawbacks appeared such as overheat, instability and power inefficiency [3]. These drawbacks lead the industry to the development of Single Instruction Multiple Data (SIMD) machines and multicore processors [4]. Today, embedded mobile processors have many-core Graphics Processing Units (GPUs) that have high data throughput. Since there is no data dependency among the possible different convolutions, the operation can be parallelized onto small processors like GPU cores. Where g(x, y) is the output image, f(x, y) is the input image and w is the filter kernel
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.