Abstract

Script recognition is the first necessary preliminary step for text recognition. In the deep learning era, for this task two essential requirements are the availability of a large labeled dataset for training and computational resources to train models. But if we have limitations on these requirements then we need to think of alternative methods. This provides an impetus to explore the field of transfer learning, in which the previously trained model knowledge established in the benchmark dataset can be reused in another smaller dataset for another task, thus saving computational power as it requires to train only less number of parameters from the total parameters in the model. Here we study two pre-trained models and fine-tune them for script classification tasks. Firstly, the VGG-16 pre-trained model is fine-tuned for publically available CVSI-15 and MLe2e datasets for script recognition. Secondly, a well-performed model on Devanagari handwritten characters dataset has been adopted and fine-tuned for the Kaggle Devanagari numeral dataset for numeral recognition. The performance of proposed fine-tune models is related to the nature of the target dataset as similar or dissimilar from the original dataset and it has been analyzed with widely used optimizers.

Highlights

  • Script identification in documents and scene images is an essential starting point for text recognition under multi-lingual scenarios

  • Two pre-trained models are taken, first pre-trained model is VGG-16 trained on ImageNet dataset and second pre-trained model is trained on Devanagari handwritten characters dataset (DHCD)

  • First model is fine-tuned for competition on video script identification (CVSI)-15 and MLe2e datasets which is different from the original dataset for script classification tasks

Read more

Summary

Introduction

Script identification in documents and scene images is an essential starting point for text recognition under multi-lingual scenarios. As deep learning model requires high computational power and a very large dataset for training and if we have less computational power (or fewer resources) and less dataset for training the performance of trained model is affected by low performance on real world test data To overcome this issue we may try to use weights of well-trained model on very large benchmark dataset such as ImageNet which consists millions of images for classification task. This gives the motivation to explore transfer learning area where knowledge of pre-trained model can be used on another dataset for another task and this saves computational power and requirement of very large dataset. The description of pre-trained models used are as follows: VGG block 1 (freeze)

Pre-Trained VGG-16 Model
Pre-Trained Model on DHCH
Method
Findings
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call