Abstract

Edge computing provides new possibilities for Deep Neural Network (DNN) applications, but the constrained resources of edge devices pose limitations to real-time inference execution. Much effort has been devoted to DNN pruning. However, conventional pruning methods often cause degradation of the detection accuracy. In this work, we try to answer how pruning can be performed without paying the price of performance loss. Conventional filter-level pruning methods remove filters demanded as less important. The drawback is that the pruned information is lost, and the detection accuracy deteriorates. To address this issue, we propose a new filter-level pruning method to enable real-time inference execution of DNNs on edge devices, while minimizing the detection accuracy drop. Our proposed pruning method analyzes layers' properties to assign to each layer a different pruning ratio, proportional to how much pruning the layer will impact the overall detection accuracy. Inside each layer, a percentage of filters that produce similar information is removed. Experimental results show that our pruning method yields accurate and compact DNNs, with minimal loss in performance. For a custom object detection network, based on the popular YOLO v3, we reduce the network filters by 60% and floating-point operations (FLOPs) by 80%. Experimental evaluations on Jetson Nano demonstrate that our proposed pruning method increases the inference speed by 16%. Compared to the original heavy network, the pruned network achieves a comparable accuracy, but with significantly lower memory usage and computational cost, making the solution suitable for real-time inference.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call