Deep neural networks (DNNs) have technical issues such as long training time as the network size increases. Parameters require significant memory, which may cause migration issues for embedded devices. DNNs applied various pruning techniques to reduce the network size in deep neural networks, but many problems still exist when applying the pruning techniques. Among neural networks, several applications applied autoencoders for reconstruction and dimension reduction. However, network size is a disadvantage of autoencoders since the architecture of the autoencoders has a double workload due to the encoding and decoding processes. In this research, we chose autoencoders and two deep neural networks AlexNet and VGG16 to apply out of order layer pruning. We perform the sensitivity analysis to explore the performance variations for the network architecture and network complexity through an out of order layer pruning mechanism. As a result of applying the proposed layer pruning scheme to the autoencoder, we developed the accordion autoencoder (A2E) and applied credit card fraud detection and MNIST classification. Our results show 4.9 Percent and 13.6 Percent performance drops, respectively, but we observe a significant reduction in network complexity, 85.1 Percent and 94.5 Percent for each application. We extend the out of order layer pruning to deeper learning networks. In our approach, we propose a simple yet efficient scheme, accuracy aware structured filter pruning based on the characterization of each convolutional layer combined with the quantization of fully connected layers. We investigate the accuracy and compression rate of each layer using a fixed pruning ratio, and then the pruning priority is rearranged depending on the accuracy of each layer. Our analysis of layer characterization shows that the pruning order of the layers does affect the final accuracy of the deep neural network. Based on our experiments using the proposed pruning scheme, the parameter size in the AlexNet can be up to 47.28x smaller than the original model. We also obtained comparable results for VGG16, achieving a maximum compression rate of 35.21x.
Read full abstract