Joint architecture and knowledge distillation in CNN for Chinese text recognition

Zi-Rui Wang,Jun Du

doi:10.1016/j.patcog.2020.107722

Abstract

The distillation technique helps transform cumbersome neural networks into compact networks so that models can be deployed on alternative hardware devices. The main advantage of distillation-based approaches include a simple training process, supported by most off-the-shelf deep learning software and no special hardware requirements. In this paper, we propose a guideline for distilling the architecture and knowledge of pretrained standard CNNs. The proposed algorithm is first verified on a large-scale task: offline handwritten Chinese text recognition (HCTR). Compared with the CNN in the state-of-the-art system, the reconstructed compact CNN can reduce the computational cost by >10×and the model size by >8×with negligible accuracy loss. Then, by conducting experiments on two additional classification task datasets: Chinese Text in the Wild (CTW) and MNIST, we demonstrate that the proposed approach can also be successfully applied on mainstream backbone networks.

Full Text