Abstract

Automatic image annotation provides means for users to search image collections on the semantic level using natural language queries. In the past, statistical machine translation models have been successfully applied to automatic image annotation. A problem with this approach is that, due to the skewed distribution of term frequency for annotation words, common words have been overly favored, which leaves little room for uncommon words to be used in auto-annotations. In contrast, studies on information retrieval have revealed that uncommon words are at least as important as common words since they are also often used in users’ queries. Unlike the previous studies where a single type of statistical translation model is considered for automatic image annotation, in this paper, we studied two types of statistical translation models: a forward translation model, which translates visual information into textual words, and a backward model, which translates textual words into visual images. In particular, we propose a new statistical translation model, named regularization-based symmetric statistical translation model, which combines strength of forward and backward models to alleviate the problem of overly favoring common words. Our empirical studies with the Corel dataset have shown that the proposed model performs considerably better than the existing translation model and a state-of-the-art approach for automatic image annotation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call