Abstract

Recent works on convolutional neural networks (CNN) have attempted to find the local optima with ensemble-based approaches. Fast Geometric Ensemble (FGE) showed that captured weight points at the end of training time circulate local optima. This led to the Stochastic Weight Averaging (SWA) approach, which averages multiple model weights to find the local optima. However, they are limited by their output of fully-parameterized models, including needless parameters, after the training procedure. To solve this problem, we propose a novel training procedure: Stochastic Weight Averaging by One-way Variational Pruning (SWA-OVP). SWA-OVP reduces the number of model parameters by variationally updating the mask of weights for pruning. SWA-OVP variationally generates a mask for pruned weights in each iteration while recent pruning approaches produce the mask at the end of each training. In addition, our SWA-OVP prunes the model in a one-way training procedure, while other recent approaches prune the model weights in iterative training or require additional computation. Our experiment shows that SWA-OVP using only a 0.5x% $\sim$ 0.7x% parameter size achieves even higher accuracy than SWA and FGE on several networks, such as Pre-ResNet110, Pre-ResNet164 and WideResNet28x10 on CIFAR10 and CIFAR100 datasets. SWA-OVP also achieves better performance compared to state-of-the-art pruning approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.