Modeling correlation between multi-modal continuous words for pLSA-based video classification

Cencen Zhong,Zhenjiang Miao

doi:10.1109/icip.2014.7025874

Abstract

Seeing that probabilistic Latent Semantic Analysis (pLSA) deals with discrete quantity only, pLSA with Gaussian Mixtures (GM-pLSA) extends it to continuous feature space by treating continuous feature as continuous word. However, GM-pLSA does not provide a clear way of modeling multimodal features, and also neglects the intrinsic correlation between these continuous words. In this paper, we present a graph regularized multi-modal GM-pLSA (GRMMGM-pLSA) model to incorporate such correlation between multimodal continuous words into the process of model learning. First, multiple GMMs are adopted with each depicting the distribution of continuous words from each modality; and then, a graph regularizer is introduced to capture the word correlation. In the task of video classification, GRMMGM-pLSA that takes both multi-modal visual features of sub-shots and word correlation in terms of temporal consistency between sub-shots into account is exploited to perform feature mapping. Experiments on YouTube videos show the effectiveness of our proposed model.

Full Text