Abstract

Automatic image annotation methods are extremely beneficial for image search, retrieval, and organization systems. The lack of strict correlation between semantic concepts and visual features, referred to as the semantic gap, is a huge challenge for annotation systems. In this paper, we propose an image annotation model that incorporates contextual cues collected from sources both intrinsic and extrinsic to images, to bridge the semantic gap. The main focus of this paper is a large real-world data set of news images that we collected. Unlike standard image annotation benchmark data sets, our data set does not require human annotators to generate artificial ground truth descriptions after data collection, since our images already include contextually meaningful and real-world captions written by journalists. We thoroughly study the nature of image descriptions in this real-world data set. News image captions describe both visual contents and the contexts of images. Auxiliary information sources are also available with such images in the form of news article and metadata (e.g., keywords and categories). The proposed framework extracts contextual-cues from available sources of different data modalities and transforms them into a common representation space, i.e., the probability space. Predicted annotations are later transformed into sentence-like captions through an extractive framework applied over news articles. Our context-driven framework outperforms the state of the art on the collected data set of approximately 20 000 items, as well as on a previously available smaller news images data set.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.