Abstract

We present a novel method for fusing the results of multiple semantic video indexing algorithms that use different types of feature descriptors and different classification methods. This method, called Context-Dependent Fusion (CDF), is motivated by the fact that the relative performance of different semantic indexing methods can vary significantly depending on the video type, context information, and the high-level concept of the video segment to be labeled. The training part of CDF has two main components: context extraction and algorithm fusion. In context extraction, the low-level audio-visual descriptors used by the different classification algorithms are combined and used to partition the descriptors space into groups of similar video shots, or contexts. The algorithm fusion component identifies a subset of classification algorithms (local experts) for each context based on their relative performance within the context. Results on the TRECVID-2002 data collections show that the proposed method can identify meaningful and coherent clusters and that different labeling algorithms can be identified for the different contexts. Our initial experiments have indicated that the context-dependent fusion outperforms the individual algorithms. We also show that using simple visual descriptors and a simple K-NN classifier, the CDF approach provides results that are comparable to the state-of-the-art methods in semantic indexing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call