Gradient Matters: Designing Binarized Neural Networks via Enhanced Information-Flow.

Qi Wang,Nianhui Guo,Zeping Yin,Zhitong Xiong,Xuelong Li

doi:10.1109/tpami.2021.3117908

Abstract

Binarized neural networks (BNNs) have drawn significant attention in recent years, owing to great potential in reducing computation and storage consumption. While it is attractive, traditional BNNs usually suffer from slow convergence speed and dramatical accuracy-degradation on large-scale classification datasets. To minimize the gap between BNNs and deep neural networks (DNNs), we propose a new framework of designing BNNs, dubbed Hyper-BinaryNet, from the aspect of enhanced information-flow. Our contributions are threefold: 1) Considering the capacity-limitation in the backward pass, we propose an 1-bit convolution module named HyperConv. By exploiting the capacity of auxiliary neural networks, BNNs gain better performance on large-scale image classification task. 2) Considering the slow convergence speed in BNNs, we rethink the gradient accumulation mechanism and propose a hyper accumulation technique. By accumulating gradients in multiple variables rather than one as before, the gradient paths for each weight increase, which escapes BNNs from the gradient bottleneck problem during training. 3) Considering the ill-posed optimization problem, a novel gradient estimation warmup strategy, dubbed STE-Warmup, is developed. This strategy prevents BNNs from the unstable optimization process by progressively transferring neural networks from 32-bit to 1-bit. We conduct evaluations with variant architectures on three public datasets: CIFAR-10/100 and ImageNet. Compared with state-of-the-art BNNs, Hyper-BinaryNet shows faster convergence speed and outperforms existing BNNs by a large margin.

Full Text