Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme

Ningning Liu,Emmanuel Dellandréa,Liming Chen,Chao Zhu,Yu Zhang,Charles-Edmond Bichot,Stéphane Bres,Bruno Tellez

doi:10.1016/j.cviu.2012.10.009

Abstract

The text associated with images provides valuable semantic meanings about image content that can hardly be described by low-level visual features. In this paper, we propose a novel multimodal approach to automatically predict the visual concepts of images through an effective fusion of textual features along with visual ones. In contrast to the classical Bag-of-Words approach which simply relies on term frequencies, we propose a novel textual descriptor, namely the Histogram of Textual Concepts (HTC), which accounts for the relatedness of semantic concepts in accumulating the contributions of words from the image caption toward a dictionary. In addition to the popular SIFT-like features, we also evaluate a set of mid-level visual features, aiming at characterizing the harmony, dynamism and aesthetic quality of visual content, in relationship with affective concepts. Finally, a novel selective weighted late fusion (SWLF) scheme is proposed to automatically select and weight the scores from the best features according to the concept to be classified. This scheme proves particularly useful for the image annotation task with a multi-label scenario. Extensive experiments were carried out on the MIR FLICKR image collection within the ImageCLEF 2011 photo annotation challenge. Our best model, which is a late fusion of textual and visual features, achieved a MiAP (Mean interpolated Average Precision) of 43.69% and ranked 2nd out of 79 runs. We also provide comprehensive analysis of the experimental results and give some insights for future improvements.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding

Lead the way for us

Journal: Computer Vision and Image Understanding	Publication Date: Dec 11, 2012
Citations: 100

Similar Papers

Bimodal fusion of low-level visual features and high-level semantic features for near-duplicate video clip detection
Hyun-Seok Min ... Yong Man Ro
Signal Processing: Image Communication | VOL. 26
Hyun-Seok Min, et. al.Hyun-Seok Min ... Yong Man Ro
15 Apr 2011
Signal Processing: Image Communication | VOL. 26

A Selective Weighted Late Fusion for Visual Concept Recognition
Ningning Liu ... Charles-Edmond Bichot
-
Ningning Liu, et. al.Ningning Liu ... Charles-Edmond Bichot
01 Jan 2012
01 Jan 2012

A Selective Weighted Late Fusion for Visual Concept Recognition
Ningning Liu ... Liming Chen
-
Ningning Liu, et. al.Ningning Liu ... Liming Chen
01 Jan 2014
01 Jan 2014

Human action recognition using fusion of features for unconstrained video sequences
Chirag I Patel ... Ripal Patel
Computers & Electrical Engineering | VOL. 70
Chirag I Patel, et. al.Chirag I Patel ... Ripal Patel
18 Jun 2016
Computers & Electrical Engineering | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding