Abstract

Graphical texts in natural images play an important role in portraying information in multitudinous fields such as communication, education, and entertainment to name a few. Recognizing text in scene images is challenging due to the inherent complexity of the images. Text recognition in natural images involves script identification, which requires text localization. This is not trivial for natural scene images due to the presence of disparate foreground/background components. For scene images like movie posters, the challenge is more dominant. The challenges aggravate due to the presence of composite characteristics of posters like complex graphics background and the presence of different texts like a movie title, names of actors, producers, directors, and tagline. These texts have miscellaneous fonts, variations in colors, size, orientation, and textures. In this work, an M-EAST (modified EAST) model is proposed, which is based on the EAST (efficient and accurate scene text detector) model for text localization. A novel movie title extraction is thereafter used for separating the title from the extracted text pool. Finally, the title script was identified using a shallow convolutional neural network (SCNN)-based architecture to ensure functionality in low-resource environments. Experiments were performed on a dataset of movie-poster images of Tollywood, Bollywood, and Hollywood industries, and a highest accuracy of 99.82% was obtained. The system performed better than the reported techniques.

Highlights

  • In human-computer interaction, the success of smartphones and broad demands for content-based image search/understanding has highly amplified the role of text recognition

  • Khalil et al [27] discussed a method for script identification in scene text images by augmenting the EAST model using a fully connected network module

  • Comparing the performance of the EAST and M-EAST models (Tables 6 and 13), it was observed that the Fscore value got improved by 4.03%, 1.93%, and 2.70% for Tollywood, Bollywood, and Hollywood poster images, respectively

Read more

Summary

INTRODUCTION

In human-computer interaction, the success of smartphones and broad demands for content-based image search/understanding has highly amplified the role of text recognition. The natural scene images are not graphically as rich as movie posters, which makes the latter a more challenging [1], [2] for text extraction and script identification. It is important to get the movie titles from the extracted text because the titles represent genre, category, [16, 17] etc This extraction is followed by script identification, which is a challenging task due to the presence of graphic-rich components. Audience with only speaking proficiency in a particular language face problems in reading and understanding the title in a movie poster written in that language This is especially true in a multi-script/lingual country like India, where movies are usually released in different languages in the same multiplex. The paper is structured as follows: in section II and III the literature study and the proposed work is explained respectively; in section IV the experimental procedure is discussed and in section V, the conclusion of the article is conferred

LITERATURE STUDY
PRE-PROCESSING
M-EAST
TITLE BOX EXTRACTION
SHALLOW CONVOLUTIONAL NEURAL NETWORK ARCHITECTURE
EVALUATION PROTOCOL
Method
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call