Abstract

Image annotation is a challenge task due to the semantic gap between low-level visual features and high-level human concepts. Most previous annotation methods take the task as a multilabel classification problem. However, these methods always suffer from poor accuracy and efficiency in the case that plentiful visual variations and large semantic vocabularies are encountered. In this paper, we focus on two-level image annotation by integrating both the global and local visual features with semantic hierarchies, in an effort to simultaneously learn annotation correspondences in a relatively small and most relevant subspace. Given an image, the two-level task includes scene classification for the image and object labeling for its regions. For scene classification, we first define several specific scenes that describe the most case of the given image data, and then use support vector machines (SVMs) based on the global features. For region labeling, we first format a set of abstract nouns in accordance with WordNet to define relevant objects, and then use local support tensor machines (LSTMs) based on high-order regional features. By introducing a new conditional random field (CRF) model that exploits the multiple correlations with respect to scene–object hierarchies and object–object relationships, our system achieves a more hierarchical and coherent description of image contents than do the simpler image annotation tasks. Experimental results have been reported over the MSRC and SAIAPR datasets to validate the superiority of using multiple visual features and prior semantic correlations for image annotation by comparing with several state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call