Improved Telugu Scene Text Recognition with Thin Plate Spline Transform

Nandam Srinivasa Rao,Atul Negi

doi:10.1007/978-981-16-9113-3_65

Abstract

Srinivasa Rao, Nandam Negi, AtulScene text recognition is a difficult task because of complex backgrounds, different text orientations, varying lighting conditions and noise introduced by devices used to capture the images. The difficulty increases when the data used to train the model has very few samples like in the case of Telugu scene text recognition. This paper tries to address the issues caused by complex text shapes and the lack of huge training data for Telugu scene text recognition. We apply a thin plate spline transform (TPS) as a preprocessor to text recognizer to handle the complexity caused by the irregular text shapes. The text recognition model is based on the convolutional recurrent network (CRNN)-based model which has been used for various traditional OCR and Telugu scene detection applications. It uses a Resnet-based feature extractor which is much more successful in extracting rich features compared to VGG used in traditional convolutional recurrent network (CRNN) models. The features extracted by Resnet are passed to a bidirectional LSTM, the outputs of which are passed to a final prediction layer which uses a softmax classifier. Connectionist temporal classification (CTC) loss is used as a loss function. Instead of training from scratch, the weights for training Telugu text recognition models are loaded with weights trained on large English scene text datasets (SynthText, MJSynth) to give a good initialization for model weights. We show that above additions increase normalized edit distance of the network by large margin and produce a better scene text recognition framework for Telugu text. The recognizer is able to perform well under complex under text orientations and varying fonts, shapes and highly varying characters present in the Telugu text. We also show that the network achieves better normalized edit distance and faster convergence when loaded with weights trained on English scene text datasets when they are applied on Telugu text data. This emphasizes the use of proper weight initialization and benefits of fine tuning for producing a robust framework for Telugu scene text detection.

Full Text