Abstract

Multimedia automatic annotation, which assigns text labels to multimedia objects, has been widely studied. However, existing methods usually focus on modeling two types of media data or pairwise correlation. In fact, heterogeneous media are complementary to each other and optimizing them simultaneously can further improve accuracy. In this paper, a novel common space learning (CSL) algorithm for multimedia integrated annotation is presented, by which heterogeneous media data can be projected into a unified space and multimedia annotation is transformed to the nearest neighbor search in the space. Optimizing these heterogeneous media simultaneously makes the heterogeneous media complementary to each other and aligned in the common space. We solve the proposed CSL as an optimization problem mainly considering the following issues. First, different types of media objects with the similar labels should be closer in the common space. Second, the media similarity of the original space and the common space should be consistent. We attempt to solve the optimization problem in a sparse and semi-supervised learning framework, thus more unlabeled data can be integrated into the learning process, which can boost the performance of space learning. In addition, we proposed an iterative optimization algorithm to solve the problem. Since the projected samples in the common space share the same representation, the labels for new media object are assigned by a simple nearest neighbor voting mechanism. To the best of our knowledge, our method has made the first attempt to multimedia integrated annotation. Experiments on data sets with up to four media types (image, sound, video and 3D model) show the effectiveness of our proposed approach, as compared with the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call