Optimization for Deep Convolutional Neural Networks: How Slim Can It Go?

Stephen D Liang

doi:10.1109/tetci.2018.2876573

Abstract

The deep convolutional neural networks (CNN) have a vast amount of parameters, especially in the fully connected (FC) layers, which have become a bottleneck for real-time applications where processing latency is high due to computational cost. In this paper, we propose to optimize the FC layers in CNN via making it much slimmer. We make analysis of the statistical distribution of the weights in FC layer, and observe each column follows Gaussian distribution. Regression model analysis of the weights of FC layer based on Akaike information criteria and Bayesian information criterion demonstrates that they have Granger causality, which means the columns are correlated and they follow colored Gaussian distribution. Based on this distribution, we derive a CNN design and optimization theorem for FC layers from information theory point of view. The theorem provides two design criteria, rank and singular values. Further, we show that FC layer with weights of colored Gaussian is more efficient than that of white Gaussian. The optimization criteria is singular-values-based, so we apply singular value decomposition to find the maximal singular values and QR to identify the corresponding columns in FC layer. We evaluate our optimization approach to AlexNet and apply the slimmer CNN to ImageNet classification. Simulation results show our approach performs much better than random dropout. Specifically, with only around $\text{28}{\%}$ of weights, the AlexNet could perform as well as the original AlexNet in terms of top one error and top five error.

Full Text