Distillation-Guided Residual Learning for Binary Convolutional Neural Networks.

Jianming Ye,Jingdong Wang,Shiliang Zhang

doi:10.1109/tnnls.2021.3087731

Abstract

It is challenging to bridge the performance gap between binary convolutional neural network (BCNN) and floating-point CNN (FCNN). This performance gap is mainly caused by the inferior modeling capability and training strategy of BCNN, which leads to substantial residuals in intermediate feature maps between BCNN and FCNN. To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN. This intuition leads to a more effective training strategy for BCNN, i.e., optimizing each binary convolutional block with blockwise distillation loss derived from FCNN. The goal of minimizing the residuals in intermediate feature maps also motivates us to update the binary convolutional block architecture to facilitate the optimization of blockwise distillation loss. Specifically, a lightweight shortcut branch is inserted into each binary convolutional block to complement residuals at each block. Benefited from its squeeze-and-interaction (SI) structure, this shortcut branch introduces a fraction of parameters, e.g., less than 10% overheads, but effectively boosts the modeling capability of binary convolution blocks in BCNN. Extensive experiments on ImageNet demonstrate the superior performance of our method in both classification efficiency and accuracy, e.g., BCNN trained with our methods achieves the accuracy of 60.45% on ImageNet, better than many state-of-the-art ones.

Full Text