Abstract

Developing systems for interpreting visuals, such as images, videos is really challenging but important task to be developed and applied on benchmark datasets. This study solves the very challenge by using STN-OCR model consisting of deep neural networks (DNN) and Spatial Transformer Networks (STNs). The network architecture of this study consists of two stages: localization network and recognition network. In the localization network it finds and localizes text regions and generates sampling grid. Whereas, in the recognition network, text regions will be input and then this network learns to recognize text including low resolution, curved and multi-oriented text. Deep learning-based approaches require a lot of data for training effectively, therefore, this study has used two benchmark datasets, Street View House Numbers (SVHN) and International Conference on Document Analysis and Recognition (ICDAR) 2015 to evaluate the system. The STN-OCR model achieves better results than literature on these datasets.

Highlights

  • Text detection from scenary images is becoming focused area of research

  • This section discusses about the experiments, results and discussions which are achieved by using the study model ‘s architecture.In this study, two benchmarks datasets International Conference on Document Analysis and Recognition (ICDAR) 2015 [40] and Street View House Numbers (SVHN) are used

  • ICDAR 2015 dataset is used in Robust Reading Competition, containing total 1500 images with over 10k annotations. 1000 images are used for training and remaining 500 images are used for testing

Read more

Summary

Introduction

Text detection from scenary images is becoming focused area of research. It has attracted many researchers [1]–[3] from computer vision area due to its various applications such as tagging people in security cameras, understanding street signs for navigation, sign recognition in driver assisted systems, vehicle identifications, navigation people with low vision and processing bank cheques. Digital devices like smart phones or cameras are being used to produce a lot of multimedia contents (such as images and videos) across the world. It is easy to capture the world’s sceneries in the digital images through mobile devices as the prices are decreasing and performance is increasing. The solutions are still under discussion with researchers

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.