Abstract
Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.
Highlights
Distillation techniques consist of training a deep neural network, termed ‘student’, to reproduce the outputs of another model, termed ‘teacher’, with the student being typically smaller than the teacher
In an effort to reduce the number of parameters in deep convolutional neural networks, it is usual to target the deep layers in priority
One could believe that ThriftyNets are not likely to reach top performance, as generic deep neural networks are believed to produce more abstract features as we go deeper in their architectures, whereas ThriftyNets use the same features at any depth
Summary
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. We focus on reducing the number of parameters of architectures,which is usually strongly connected to the memory usage of the model. In this area, factorizing methods, which identify similar sets of parameters and merge them [4], are effective, in that they considerably reduce the number of parameters while maintaining the same global structure and number of flops. We propose to introduce a new factorized deep learning model, in which the factorization is not learned during training, but rather imposed at the creation of the model We call these models ThriftyNets, as they typically contain a very constrained number of parameters, while achieving top-tier results on standard classification vision datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.