Abstract

Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.

Highlights

  • Distillation techniques consist of training a deep neural network, termed ‘student’, to reproduce the outputs of another model, termed ‘teacher’, with the student being typically smaller than the teacher

  • In an effort to reduce the number of parameters in deep convolutional neural networks, it is usual to target the deep layers in priority

  • One could believe that ThriftyNets are not likely to reach top performance, as generic deep neural networks are believed to produce more abstract features as we go deeper in their architectures, whereas ThriftyNets use the same features at any depth

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. We focus on reducing the number of parameters of architectures,which is usually strongly connected to the memory usage of the model. In this area, factorizing methods, which identify similar sets of parameters and merge them [4], are effective, in that they considerably reduce the number of parameters while maintaining the same global structure and number of flops. We propose to introduce a new factorized deep learning model, in which the factorization is not learned during training, but rather imposed at the creation of the model We call these models ThriftyNets, as they typically contain a very constrained number of parameters, while achieving top-tier results on standard classification vision datasets.

Related Work
Pruning
Quantization
Distillation
Efficient Scaling
Factorization
Recurrent Residual Networks as ODE
Context
Thrifty Networks
Augmented Thrifty Networks
Pooling Strategy
Grouped Convolutions
Hyperparameters and Size of the Model
Depth and Abstraction
Experiments
Impact of Data Augmentation
Comparison with Standard Architectures
Factorization and Filter Usage
Efficient ThriftyNets
Effect of the Number of Iterations
Effect of the Number of Filters
Effect of the Number of Downsamplings
Freezing the Shortcut Parameters in an Augmented ThriftyNet
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.