Abstract

Srinivasa Rao, Nandam Negi, AtulScene text recognition is a difficult task because of complex backgrounds, different text orientations, varying lighting conditions and noise introduced by devices used to capture the images. The difficulty increases when the data used to train the model has very few samples like in the case of Telugu scene text recognition. This paper tries to address the issues caused by complex text shapes and the lack of huge training data for Telugu scene text recognition. We apply a thin plate spline transform (TPS) as a preprocessor to text recognizer to handle the complexity caused by the irregular text shapes. The text recognition model is based on the convolutional recurrent network (CRNN)-based model which has been used for various traditional OCR and Telugu scene detection applications. It uses a Resnet-based feature extractor which is much more successful in extracting rich features compared to VGG used in traditional convolutional recurrent network (CRNN) models. The features extracted by Resnet are passed to a bidirectional LSTM, the outputs of which are passed to a final prediction layer which uses a softmax classifier. Connectionist temporal classification (CTC) loss is used as a loss function. Instead of training from scratch, the weights for training Telugu text recognition models are loaded with weights trained on large English scene text datasets (SynthText, MJSynth) to give a good initialization for model weights. We show that above additions increase normalized edit distance of the network by large margin and produce a better scene text recognition framework for Telugu text. The recognizer is able to perform well under complex under text orientations and varying fonts, shapes and highly varying characters present in the Telugu text. We also show that the network achieves better normalized edit distance and faster convergence when loaded with weights trained on English scene text datasets when they are applied on Telugu text data. This emphasizes the use of proper weight initialization and benefits of fine tuning for producing a robust framework for Telugu scene text detection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.