Abstract

Due to advancement of multimedia technology, availability and usage of image and video data is enormous. For indexing and retrieving those data, there is a need for an efficient technique. Now, Automatic keyword generation for images is a focussed research which has lot of attractions. In general, conventional auto annotation methods having lesser performance over deep learning methods. The annotation is transformed as captioning in deep learning models. In this paper, we propose a new model CSL Net (CSLN) as a combination of convoluted squeeze and excitation block with Bi-LSTM blocks to predict tags for images. The proposed model is evaluated using the various benchmark datasets like CIFAR10, Corel5K, ESPGame and IAPRTC12. It is observed that, the proposed work yields better results compared to that of the existing methods in term of precision, recall and accuracy.

Highlights

  • Automatic Image annotation has much attention in the computer vision research domain because image search needsthe content to be described

  • Proposed network combines Convolution layer, Squeeze and excitation block and LSTM, it is named as CSL Net

  • We proposed CSL Net to overcome the mentioned challenges

Read more

Summary

INTRODUCTION

Automatic Image annotation has much attention in the computer vision research domain because image search needsthe content to be described In this smart phone era, digital images are growing rapidly in social medias, blogger sites, CCTV footages, etc. Language processing as meaningful sentence called as captioning.In deep learning annotation models, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) perform well to encode features of image and decode the features into the natural language representation [2].Later Long-Short-Term-Memory (LSTM) has been introduced to conserve the dependency for future reference and good in natural language generating [13] which RNN can’t. Deep learning image annotation applies any convolution models for feature extraction and use classifier to classify images. SE block used to extract the highlighted features from the convoluted input and Bi-LSTM used to train the network and annotates images. In phase-3, those extracted information are combined and are considered as features for training to generate tags and train the modelto generate tags

RELATED WORK
EXPERIMENTAL RESULTS
Implementation Details
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.