Transductive multi-distance learning for video search

Songhao Zhu,Zhiwei Liang,Yuncai Liu

doi:10.1007/s10044-010-0196-4

Abstract

Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled training data in many real-world application areas, such as video annotation. As a significant factor of these algorithms, however, pair-wise similarity metric of samples has not been fully investigated. Specifically, for existing approaches, the estimation of pair-wise similarity between two samples relies on the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pair-wise similarity measure. Accordingly, in this paper, a novel framework for video annotation, called Joint Spatio-Temporal Correlation Learning (JSTCL) is proposed. This framework is characterized by simultaneously taking into account both the spatial and temporal property of video data to improve the estimation of pair-wise similarity. We apply the proposed framework to video annotation and report superior performance compared to key existing approaches over the benchmark TRECVID data set.

Full Text