Abstract

Integrated convolutional neural network (CNN) and deep bidirectional long short-term memory (DBLSTM) based character models have achieved excellent recognition accuracies on optical character recognition (OCR) tasks, along with large amount of model parameters and massive computation cost. To deploy CNN-DBLSTM model in products with CPU server, there is an urgent need to compress and accelerate it as much as possible, especially the CNN part, which dominates both parameters and computation. In this paper, we study teacher-student learning and Tucker decomposition methods to reduce model size and runtime latency for CNN-DBLSTM based character model for OCR. We use teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition to further compress the student model. For teacher-student learning, we design a novel learning criterion to bring in the guidance of succeeding LSTM layer when matching the CNN-extracted feature sequences of the large teacher and small student models. Experimental results on large scale handwritten and printed OCR tasks show that, using teacher-student learning alone achieves 9.90 × footprint reduction and 15.23 × inference speedup yet without degrading recognition accuracy. Combined with Tucker decomposition method, we can compress and accelerate the model further. The decomposed model achieves 11.89 × footprint reduction and 22.16 × inference speedup while suffering no or only a small recognition accuracy degradation against the large-size baseline model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.