DenseNet with Up-Sampling block for recognizing texts in images

Zeming Tang,Meng Wang,Li Zhang,Zhao Zhang,Mingbo Zhao,Weiming Jiang

doi:10.1007/s00521-019-04285-8

Abstract

The Convolutional Recurrent Neural Networks (CRNN) have achieved a great success for the study of OCR. But existing deep models usually apply the down-sampling in pooling operation to reduce the size of features by dropping some feature information, which may cause the relevant characters with small occupancy rate to be missed. Moreover, all hidden layer units in the cyclic module need to be connected in cyclic layer, which may result in a heavy computation burden. In this paper, we explore to improve the results potentially using Dense Convolutional Network (DenseNet) to replace the convolution network of the CRNN to connect and combine multiple features. Also, we use the up-sampling function to construct an Up-Sampling block to reduce the negative effects of down-sampling in pooling stage and restore the lost information to a certain extent. Thus, informative features can also be extracted with deeper structure. Besides, we also directly use the output of inner convolution parts to describe the label distribution of each frame to make the process efficient. Finally, we propose a new OCR framework, termed DenseNet with Up-Sampling block joint with the connectionist temporal classification, for Chinese recognition. Results on Chinese string dataset show that our model delivers the enhanced performance, compared with several popular deep frameworks.

Full Text