Abstract

Deep neural networks (DNNs) have delivered unprecedented achievements in the modern Internet of Everything society, encompassing autonomous driving, expert diagnosis, unmanned supermarkets, etc. It continues to be challenging for researchers and engineers to develop a high-performance neuromorphic processor for deployment in edge devices or embedded hardware. DNNs’ superpower derives from their enormous and complex network architecture, which is computation-intensive, time-consuming, and energy-heavy. Due to the limited perceptual capacity of humans, accurate processing results from DNNs require a substantial amount of computing time, making them redundant in some applications. Utilizing adaptive quantization technology to compress the DNN model with sufficient accuracy is crucial for facilitating the deployment of neuromorphic processors in emerging edge applications. This study proposes a method to boost the development of neuromorphic processors by conducting fixed-point multiplication in a hybrid Q-format using an adaptive quantization technique on the convolution of tiny YOLO3. In particular, this work integrates the sign-bit check and bit roundoff techniques into the arithmetic of fixed-point multiplications to address overflow and roundoff issues within the convolution’s adding and multiplying operations. In addition, a hybrid Q-format multiplication module is developed to assess the proposed method from a hardware perspective. The experimental results prove that the hybrid multiplication with adaptive quantization on the tiny YOLO3’s weights and feature maps possesses a lower error rate than alternative fixed-point representation formats while sustaining the same object detection accuracy. Moreover, the fixed-point numbers represented by Q(6.9) have a suboptimal error rate, which can be utilized as an alternative representation form for the tiny YOLO3 algorithm-based neuromorphic processor design. In addition, the 8-bit hybrid Q-format multiplication module exhibits low power consumption and low latency in contrast to benchmark multipliers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call