Abstract

The scene Text Recognition process has become a hot research topic and a challenging task owing to the complicated background, varying light intensities, colors, font styles, and sizes. Text extraction from natural scene images encompasses two main processes: text detection and text recognition. The latest advancements in Machine Learning (ML) and Deep Learning (DL) concepts can effectually automate the text detection and recognition process by training the model properly. In this view, this paper presents an Automated DL empowered Text Detection model from Natural Scene Images (ADLTD-NSI). The ADLTD-NSI technique includes two important processes: text detection and text recognition. Firstly, a single shot detector (SSD) with Inception-v2 as a baseline model is employed for text detection, an object detector based on the VGG-16 framework for feature map extraction followed by six convolution layers. Secondly, Convolutional Recurrent Neural Network (CRNN) technique is utilized for the text recognition process. Besides, the recurrent layers in the CRNN model utilize long short-term memory (LSTM) for encoding the sequence of feature vectors. Lastly, Connectionist Temporal Classification (CTC) loss is applied to predict text labels equivalent to the sequences from the recurrent layers. A wide range of experiments was carried out on benchmark COCO datasets, and the results are examined in several aspects. The experimental outcomes showcased the better performance of the ADLTD-NSI technique over the other compared methods with a maximum accuracy of 96.78%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call