Abstract

Abstract Text is an important invention of humanity, which plays a key role in human life, so far from dark ages. Text in image is closely related to the scene or a product and is widely used in vision based application. In this paper we are addressing the problem of visual understanding with text. The main focus is combining textual cues and visual cues in deep neural network. First the text is recognized and classified from the image. Then we combine the attended word embedding and visual feature vector which are then optimized by CNN for Fine-grained image classification. We carried out the experiments on soft drink dataset in Pakistan. The results shows that the system achieves significant performance which can be potentially beneficial for real world application e.g. product search.

Highlights

  • Fine-Grained image classification [1, 28] is a real-world emerging problem and it has received great attention from research communities around the globe

  • Fine-grained image classification [1] is being applied for natural scene classification

  • In the paper of Yingying Zhu et al [14], they discussed in detail the recent advances and future trends for scene text detection and recognition

Read more

Summary

INTRODUCTION

Fine-Grained image classification [1, 28] is a real-world emerging problem and it has received great attention from research communities around the globe. Using fine-grained image classification [28] techniques on the soft drink dataset is an area that has received limited attention from the researchers, whereas this area has exciting applications in restaurants and shops, where automated orders could be places once a specific brand of soft drink is going to be out of stock. Yao et al [11] present a unified framework for detecting and recognizing the text in images to handle the texts of different orientations They provide a method for ‘search dictionary’ to correct the recognition errors. The works of Shiva kumara et al [12] and C Yao et al [13] realize the significance of multi-oriented text detection and recognition to the research community. In the paper of Yingying Zhu et al [14], they discussed in detail the recent advances and future trends for scene text detection and recognition

Text detection and recognition
Fine-Grained Classification
Attention Mechanism
Multimodal Fusion
PROPOSED METHODOLOGY
Dataset
Implementation Details
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call