Efficient Heuristic Methods for Multimodal Fusion and Concept Fusion in Video Concept Detection

Jie Geng,Zhenjiang Miao,Xiao-Ping Zhang

doi:10.1109/tmm.2015.2398195

Abstract

Semantic models are widely used to bridge the semantic gap between low-level features and high-level features in video concept indexing. Multimodal fusion and concept fusion are two commonly used approaches in building semantic models. In the previous work, domain adaptation is neglected in multimodal fusion, and many probability maximization based and unsupervised concept fusion methods are counterintuitive since they do not incorporate subjective human intuition. In this paper, we present a new two-stage semantic model combining the multimodal fusion and the concept fusion incorporating human heuristics. In the multimodal fusion model, we employ a new generic unsupervised method, namely, domain adaptive linear combination (DALC), to update the linear combination (LC) weights by incorporating the differences of element distributions between training and testing domains. In the concept fusion model, a novel mechanical node equilibrium (NE) model is developed by using forces to model the concept correlations to update the score of concepts represented by nodes. It is intuitive and can incorporate multiple kinds of correlations simultaneously to construct more sophisticated semantic structure. Compared to other state-of-the-art supervised and unsupervised methods, the new model can use either unsupervised or supervised factors to significantly improve the mean inferred average precision (MAP) performance on all datasets.

Full Text