Abstract

Deep neural networks (DNNs) have gained considerable attention in various real-world applications due to their strong performance in representation learning. However, running a DNN needs tremendous memory resources, which significantly restricts DNN from being applicable on resource-constrained platforms (e.g., IoT, mobile devices, etc.). Lightweight DNNs can accommodate the characteristics of mobile devices, but the hardware resources of mobile or IoT devices are extremely limited, and the resource consumption of lightweight models needs to be further reduced. However, the current neural network compression approaches (i.e., pruning, quantization, knowledge distillation, etc.) works poorly on the lightweight DNNs, which are already simplified. In this paper, we present a novel framework called Smart-DNN, which can efficiently reduce the memory requirements of running DNNs on resource-constrained platforms. Specifically, we slice a neural network into several segments and use SZ error-bounded lossy compression to compress each segment separately while keeping the network structure unchanged. When running a network, we first store the compressed network into memory and then partially decompress the corresponding part layer by layer. According to experimental results on four popular lightweight DNNs (usually used in resource-constrained platforms), Smart-DNN achieves memory saving of 1/10∼1/5, while slightly sacrificing inference accuracy and unchanging the neural network structure with accepted extra runtime overhead.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call