Abstract

Field-programmable gate arrays (FPGAs) have become a popular compute platform for convolutional neural network (CNN) inference; however, the design of a CNN model and its FPGA accelerator has been inherently sequential. A CNN is first prototyped with no-or-little hardware awareness to attain high accuracy; subsequently, an FPGA accelerator is tuned to that specific CNN to maximize its efficiency. Instead, we formulate a neural architecture search (NAS) optimization problem that contains parameters from both the CNN model and the FPGA accelerator, and we jointly search for the best CNN model-accelerator pair that boosts accuracy and efficiency -we call this Codesign-NAS. In this paper we focus on defining the Codesign-NAS multiobjective optimization problem, demonstrating its effectiveness, and exploring different ways of navigating the codesign search space. For Cifar-10 image classification, we enumerate close to 4 billion model-accelerator pairs, and find the Pareto frontier within that large search space. Next we propose accelerator innovations that improve the entire Pareto frontier. Finally, we compare to ResNet on a highly-tuned accelerator, and show that using codesign, we can improve on Cifar-100 classification accuracy by 1.8% while simultaneously increasing performance/area by 41% in just 1000 GPU-hours of running Codesign-NAS, thus demonstrating that our automated codesign approach is superior to sequential design of a CNN model and accelerator.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call