Abstract

In promising edge systems, AI algorithms and their hardware implementations are often joint optimized as integrated solutions to solve end-to-end design problems. Joint optimization depends on a delicate co-design of software and hardware. According to our knowledge, current co-design methodologies are still coarse-grained. In this paper, we proposed ANNA: Accelerating Neural Network Accelerator through a novel software-hardware co-design methodology. ANNA is a framework composed of three components: ANNA-NAS (Neural Architecture Search), ANNA-ARCH (hardware ARCHitecture) and ANNA-PERF (PERFormance optimizer & evaluator). ANNA-NAS adopts a cell-wise structure and is designed to be hardware aware. It aims at generating a neural network having high inference accuracy and low inference latency. To avoid tremendous time costs, ANNA-NAS synthetically uses differentiable architecture search and early stopping techniques. ANNA-ARCH starts to be designed as long as the architecture search space is defined. Based on the cell-wise structure, ANNA-ARCH specifies its main body which includes Convolution units, Activation Router and Buffer Pool. To well support different neural networks that could be generated by ANNA-NAS, the detailed part of ANNA-ARCH is configurable. ANNA-PERF harmonizes the co-design of ANNA-NAS and ANNA-ARCH. It takes a neural network and a hardware architecture as inputs. After optimizing the mapping strategy between the neural network and hardware accelerator, it feeds back a cycle-accurate latency to ANNA-NAS. Aiming at image classification, we carried out the experiments on ImageNet. Experimental results demonstrate that without loss of much inference accuracy, ANNA wins a significant low inference latency through a harmonious software and hardware co-design.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call