Abstract

Deep Neural Networks (DNNs) have shown superior performance on a variety of artificial intelligence problems. Reducing the resource usage of DNN is critical to adding intelligence on Internet of Things (IoT) devices. Channel pruning based network compression shows effective reduction simultaneously on storage, memory and computation without specialized software on general platforms. But limited by pruning flexibility, channel pruning methods have relatively low compression rate for a given target performance. In this paper, we demonstrate that channel pruning becomes more robust to decision errors by reducing the granularity of filters. Then we propose a Decouple and Stretch (DS) scheme to enhance channel pruning. Under this scheme, each filter in a specific layer is decoupled into two small spatial-wise filters, and the spatial-wise filters are stretched into two successive convolutional layers. Our scheme obtains up to 49% improvement on compression and 35% improvement on acceleration. To further demonstrate hardware compatibility, we deploy pruned networks on the FPGA, and the network produced by Decouple and Stretch scheme is more hardware-friendly with latency reduced by 42%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.