Abstract

Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) applications. However, due to the huge demand for computing and storage resources as well as the high power consumption, deploying DNN models on embedded devices is full of challenges. Recent works have shown that DNN models can be compressed by removing their inner redundancy without obviously performance decay. In this work, we propose a two stage pipeline way to compress the ResNet-14 model and test it on CIFAR-10 and SVHN dataset respectively. Firstly, we use a filter level pruning method to remove the less important filters with different compression rate, and a considerable computation costs are reduced. Secondly, we binarize the pruned model to further reduce the model size and computational complexity. The training results show that we achieve 87.7% accuracy with only 1.86Mb model size on CIFAR-10 and 96.2% accuracy with 1.34Mb on SVHN. Compared to the original model, we have 57% to 68% FLOPs reduction and 45.6× to 63.1× model size compression at the cost of roughly 4% accuracy drop. Finally, we implement the thin binarized ResNet-14 model on the Xilinx KC705 board with a shared, flexible accumulator, which can save 46.8% logic resources. And the entire network parameters are store into on-chip RAM, which can greatly reduce the energy consumption and memory overhead caused by off-chip accesses. The experimental results show that on CIFAR-10 dataset, we achieve an overall performance of 1200 FPS, energy efficiency of 571 FPS/W, which denote 2.3× and 3.6× improvements over the most recent work.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call