Abstract

When attempting to analyze and understand large-scale video datasets, choosing training videos using active learning can significantly reduce the annotation costs associated with supervised learning without sacrificing the accuracy of classifiers. However, to further reduce the computational overhead of exhaustive comparisons between low-level visual feature with high dimensions, we have developed a novel video hashing coding model based on an active learning framework. The model optimizes mean average precision in a straightforward way by explicitly considering the structure information in video clips when learning the optimal hash functions. The structure information considered includes the temporal consistency between successive frames and the local visual patterns shared by videos with same semantic labels. Rather than relying on the similarity between paired videos, we use a ranking-based loss function to directly optimize mean average precision. Then, combined with the active learning component, we jointly evaluate the unlabeled training videos according to the uncertainty and average precision values with an efficient algorithm based on structured SVM. Extensive experimental results over several benchmark datasets demonstrate that our approach produces significantly higher search accuracy than traditional query refinement schemes.

Highlights

  • Due to the popularity of digital cameras and the emergence of social media websites, such as YouTube, Facebook and Youku, the amount of videos that can be accessed by everyone has increased exponentially over the past decade

  • We summarize the joint active video hashing learning with structure information in Algorithm 1 for details

  • We report performance curves in terms of mean Average Precision of different methods in Table 1 and Table 2

Read more

Summary

Introduction

Due to the popularity of digital cameras and the emergence of social media websites, such as YouTube, Facebook and Youku, the amount of videos that can be accessed by everyone has increased exponentially over the past decade. Facing such proliferation of videos, how to efficiently analyze and understand the contents of videos in large scale has become a significant challenge in the field of computer vision [1]–[3]. In machine learning, training videos with high quality annotations is definitely beneficial to better performance It is usually tedious and time consuming to label videos in large scale; especially, the accurate and reliable annotations for some specific area should be given by some experts possessed of the corresponding background knowledge. These issues severely limit the application of supervised learning in practice, where a large amount of videos are collected without labels.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call