Architecture Exploration for Energy-Efficient Embedded Vision Applications: From General Purpose Processor to Domain Specific Accelerator

Maria Malik,Nima Akhlaghi,Paul Otto,Houman Homayoun,Farnoud Farahmand,Siddhartha Sikdar,Tinoosh Mohsenin

doi:10.1109/isvlsi.2016.112

Abstract

OpenCV applications are computationally intensive tasks among computer vision algorithms. The demand for low power yet high performance real-time processing of OpenCV embedded vision applications have led to developing their customized implementations on state-of-the-art embedded processing platforms. Given the industry move to heterogeneous platforms which integrates single core or multicore CPU with on-chip FPGA accelerators and GPU accelerators, the question of what platform and what implementation, whether hardware or software, is best suited for energy-efficient processing of this class of applications is becoming important. In this paper, we seek to answer this question through a detailed hardware and software implementation of OpenCV applications and methodically measurement and comprehensive analysis of their power and performance on state-of-the-art heterogeneous embedded processing platforms. The results show that in addition to application behavior, the size of image is an important factor in deciding the efficient platform in terms of highest energy-efficiency (EDP) among hardware accelerators on FPGA and software accelerators on GPU and multicore CPUs. While hardware implementation on ZYNQ shown to be the most performance and energy-efficient for image size of 500x500 or less, software GPU implementation found to be the most efficient and achieves highest speedup for larger image sizes. In addition, while for compute intensive vision applications the gap between FPGA, CPU and GPU reduces as the size of image increases, for non-intensive applications, a large performance and EDP gap is observed between the studied platforms, as the size of the image increases.

Full Text