Abstract

This paper focuses on the comparison between two fusion methods, namely early fusion and late fusion. The former fusion is carried out at kernel level, also known as multiple kernel learning, and in the latter, the modalities are fused through logistic regression at classifier score level. Two kinds of multilayer fusion structures, differing in the quantities of feature/kernel groups in a lower fusion layer, are constructed for early and late fusion systems, respectively. The goal of these fusion methods is to put each of various features into effect and mine redundant information of the combination of them, and then to develop a generic and robust semantic indexing system to bridge semantic gap between human concepts and these low-level visual features. Performance evaluated on both TRECVID2009 and TRECVID2010 datasets demonstrates that the systems with our proposed multilayer fusion methods at kernel level perform more stably to reach the goal than the classification-score-level fusion; the most effective and robust one with highest MAP score is constructed by early fusion with two-layer equally weighted composite kernel learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call