Abstract

Online social video websites such as YouTube allow users to manually annotate their video documents with textual labels. These labels can be used as indexing keywords to facilitate search and organization of video data. However, manual video annotation is usually a labor-intensive and time-consuming process. In this work, we propose a novel social video annotation approach that combines multiple feature sets based on a tri-adaptation approach. For the shots in each video, they are annotated by aggregating models that are learned from three complementary feature sets. Meanwhile, the models are collaboratively adapted by exploring unlabeled shots. In this sense, the method can be viewed as a novel semi-supervised algorithm that explores three complementary views. Our approach also exploits the temporal smoothness of video labels by applying a label correction strategy. Experiments on a web video dataset demonstrate the effectiveness of the proposed approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call