Abstract

The ultra-low power consumption and flexible configurability of hardware are urgent for resource-constrained artificial intelligence of things (AIoT). Thus, we propose a convolutional engine using tensor multiplication. It consists of a reconfigurable processing element (RPE) array to dynamically adjust convolutional operations with varying kernel sizes during runtime. Implemented in a 22-nm CMOS process, the proposed RPE cluster achieves high energy efficiency, flexibility, and resource utilization with low-cost hardware overhead as compared to state-of-the-art (SOTA) PE architectures. Furthermore, multiply-accumulate (MAC) operations in RPE support accurate and multiple approximate computing modes. Approximate modes achieve a configurable approximation degree, coupled with a search strategy for the approximation factor to improve energy efficiency. In the neural network-based keyword spotting (KWS) task, RPE cluster with the proposed approximation solution saves the whole system's power consumption by 34.64 % with only 0.7 % accuracy loss as compared to using accurate computing modes altogether. It achieves the lowest inference energy as compared to the SOTA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call