Abstract

The performance of text-based image retrieval is highly dependent on the tedious and inefficient manual work. For the purpose of realizing image keywords generated automatically, extensive work has been done in the area of image annotation. However, how to treat image diverse keywords and choose appropriate features are still two difficult problems. To address this challenge, we propose the multi-view stacked auto-encoder (MVSAE) framework to establish the correlations between the low-level visual features and high-level semantic information. In this paper, a new method, which incorporates the keyword frequencies and log-entropy, is presented to address the imbalanced distribution of keywords. In order to utilize the complementarities among diverse visual descriptors, we tactfully apply multi-view learning to search for the label-specific features. Thereafter, the image keywords are finally produced by appropriate features. Conducting extensive experiments on three popular data sets, we demonstrate that our proposed framework can achieve effective and favorable performance for image annotation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.