Abstract

Vision processing chips have been widely used in image processing and recognition tasks. They are conventionally designed based on the image signal processing (ISP) units directly connected with the sensors. In recent years, convolutional neural networks (CNNs) have become the dominant tools for many state-of-the-art vision processing tasks. However, CNNs cannot be processed by a conventional vision processing unit (VPU) with a high speed. On the other side, the CNN processing units cannot process the RAW images from the sensors directly and an ISP unit is required. This makes a vision system inefficient with a lot of data transmission and redundant hardware resources. Additionally, many CNN processing units suffer from a low flexibility for various CNN operations. To solve this problem, this paper proposed an efficient vision processing unit based on a hybrid processing elements array for both CNN accelerating and ISP. Resources are highly shared in this VPU, and a pipelined workflow is introduced to accelerate the vision tasks. We implement the proposed VPU on the Field-Programmable Gate Array (FPGA) platform and various vision tasks are tested on it. The results show that this VPU achieves a high efficiency for both CNN processing and ISP and shows a significant reduction in energy consumption for vision tasks consisting of CNNs and ISP. For various CNN tasks, it maintains an average multiply accumulator utilization of over 94% and achieves a performance of 163.2 GOPS with a frequency of 200 MHz.

Highlights

  • Vision processing chips have proven to be highly efficient for computer vision tasks by integrating the image sensor and vision processing unit (VPU) together in the recent works [1,2,3]

  • The complete vision tasks consisting of both the image signal processing (ISP) and the convolutional neural networks (CNNs) processing were tested on the proposed design

  • The performances of the proposed VPU on some ISP tasks are shown in Table 2 with a comparison to the VPUs in the previous works

Read more

Summary

Introduction

Vision processing chips have proven to be highly efficient for computer vision tasks by integrating the image sensor and vision processing unit (VPU) together in the recent works [1,2,3]. Most of them utilize a Single-Instruction-Multiple-Data (SIMD) array of processing elements (PE) connected with the sensor directly. They can eliminate the pixels transmission bottleneck and execute vision tasks in a parallel way. Works [1,11] proposed the VPUs that try to exploit the conventional PE array for self-organizing map (SOM) neural networks. These conventional architectures are not efficient for modern neural networks. The convolutional neural networks (CNNs) are very im-

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.