Abstract

Convolutional neural network (CNN) has become very popular in image classification tasks. With the increasing demand on intelligent classification on battery-powered devices, energy-efficient ASICs for CNN are badly needed. While previous CNN ASIC processors support operations of different kernel sizes, they sacrifice efficiency to support flexible convolution operations. In fact, convolution operations with a certain kernel size are dominating in many real-case CNNs. This brief proposes a kernel-optimized architecture for $3\,{\times }\,3$ kernels (KOP3), which are dominating operations in mainstream image classification CNNs. Although KOP3 aims at $3\,{\times }\,3$ kernel operations, it also provides programmability to support arbitrary kernel sizes. KOP3 achieves average energy efficiency of 3.77TOPS/W, which is $4.01{ \times }$ better than the best state-of-the-art CNN ASIC processor.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call