Abstract

Image annotation has been a challenging problem due to the well-known semantic gap between two heterogeneous information modalities, i.e., the visual modality referring to low-level visual features and the semantic modality referring to high-level human concepts. To bridge the semantic gap, we present an extension of latent Dirichlet allocation (LDA), denoted as class-specific Gaussian-multinomial latent Dirichlet allocation (csGM-LDA), in an effort to simulate the human’s visual perception system. An analysis of previous supervised LDA models shows that the topics discovered by generative LDA models are driven by general image regularities rather than the semantic regularities for image annotation. To address this, csGM-LDA is introduced by using class supervision at the level of visual features for multimodal topic modeling. The csGM-LDA model combines the labeling strength of topic supervision with the flexibility of topic discovery, and the modeling problem can be effectively solved by a variational expectation-maximization (EM) algorithm. Moreover, as natural images usually generate an enormous size of high-dimensional data in annotation applications, an efficient descriptor based on Laplacian regularized uncorrelated tensor representation is proposed for explicitly exploiting the manifold structures in the high-order image space. Experimental results on two standard annotation datasets have shown the effectiveness of the proposed method by comparing with several state-of-the-art annotation methods.

Highlights

  • Automatic image annotation is a challenging work of tasks related to understanding what we see in a visual scene due to the well-known semantic gap [1]

  • We develop a new extension of latent Dirichlet allocation (LDA) coupled with Laplacian regularized uncorrelated tensor representation for learning semantics in the image data

  • We propose a new three-level hierarchical probabilistic model by incorporating supervision into the extended LDA model, making the annotation applications be much effective than previous LDA models

Read more

Summary

Introduction

Automatic image annotation is a challenging work of tasks related to understanding what we see in a visual scene due to the well-known semantic gap [1]. The goal of image annotation is to assign meaningful tags to the image aiming at summarizing its visual contents Such methods are becoming more and more important given the growing collections of both private and publicly available images. The inter-tag similarity problem reveals the fact that the visual similarity does not always guarantee the semantic similarity, which in general is conflicting with the inherent assumption of many image annotation methods, e.g., some relevant methods [2,3] that perform tag propagations according to their visual similarities To cope with this problem, it is emergent to develop more discriminative visual features that can be used to separate various visual contents for different tags.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call