Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity

Zhenbo Hu,Wen Xia,Weizhe Zhang,Dingwen Tao,Sian Jin,Yang Liu,Zheng Zhang,Xiangyu Zou

doi:10.1145/3404397.3404408

Abstract

Deep neural networks (DNNs) have gained considerable attention in various real-world applications due to the strong performance on representation learning. However, a DNN needs to be trained many epochs for pursuing a higher inference accuracy, which requires storing sequential versions of DNNs and releasing the updated versions to users. As a result, large amounts of storage and network resources are required, significantly hampering DNN utilization on resource-constrained platforms (e.g., IoT, mobile phone). In this paper, we present a novel delta compression framework called Delta-DNN, which can efficiently compress the float-point numbers in DNNs by exploiting the floats similarity existing in DNNs during training. Specifically, (1) we observe the high similarity of float-point numbers between the neighboring versions of a neural network in training; (2) inspired by delta compression technique, we only record the delta (i.e., the differences) between two neighboring versions, instead of storing the full new version for DNNs; (3) we use the error-bounded lossy compression to compress the delta data for a high compression ratio, where the error bound is strictly assessed by an acceptable loss of DNNs’ inference accuracy; (4) we evaluate Delta-DNN’s performance on two scenarios, including reducing the transmission of releasing DNNs over the network and saving the storage space occupied by multiple versions of DNNs. According to experimental results on six popular DNNs, Delta-DNN achieves the compression ratio 2 × -10 × higher than state-of-the-art methods, while without sacrificing inference accuracy and changing the neural network structure.

Full Text