Learning Sparse Neural Networks Using Non-Convex Regularization

Mohammad Khalid Pandit,Roohie Naaz,Mohammad Ahsan Chishti

doi:10.1109/tetci.2021.3058672

Abstract

Deep Neural Networks (DNNs) is the computing paradigm that has achieved remarkable success in various fields of engineering in recent years, primarily visual recognition. DNNs owe its success to the presence of large number of weight parameters (and increased depth), which led to huge computation and memory costs for implementation. These limiting factors hinder the scalability of such algorithms on resource constraint devices (like IoT devices). In general, DNNs are believed to be over parametrized i.e., the parameters are highly redundant, thus can be structurally removed without significant loss of performance. To solve these issues, we propose to use non-convex T <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\ell _{1}$</tex-math></inline-formula> regularizer along with the additional effect of sparse group lasso to completely remove the redundant neurons/filters that is, to introduce structured sparsity. The network has been trained using the proximal gradient method, which is useful in optimizing functions with the combination of smooth and non-smooth terms. We show that proposed regularizer manages to achieve competitive performances as well as extremely compact networks. Detailed experiments are performed on several benchmark datasets that illustrate the efficiency of the approach. On the ImageNet dataset, our approach removes more than 50% of parameters of convolutional layers and 85% parameters of fully connected layers of Alexnet with no drop in accuracy.

Full Text