Abstract

State-of-the-art convolutional neural networks (CNNs) usually have a large number of layers and filter weights which bring huge computation and communication overheads. A general purpose instruction set architecture (ISA) is flexible but has low code density and high power consumption. The existing CNN-specific accelerators are much more efficient but usually are inflexible or require a complex controller to handle the computation and data transfer of different CNNs. In this brief, we propose a new CNN-specific ISA which embeds the parallel computation and data reuse parameters in the instructions. An instruction generator deploys the instruction parameters according to the feature of CNNs and hardware’s computation and storage resources. In addition, a reconfigurable accelerator with 225 multipliers and 24 adder trees is realized to obtain efficient parallel computation and data transfer. Compared with x86 processors, our design has 392 times better energy efficiency and 16 times higher code density. Compared with other state-of-the-art accelerators, our solution has a higher flexibility to support all popular CNNs and a higher energy efficiency.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.