Active Video Hashing via Structure Information Learning for Activity Analysis

Xiangdong Wang,Qiuxu Wang,Hui Wang

doi:10.1109/access.2020.2994783

Xiangdong Wang, Qiuxu Wang + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.2994783

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 37	License type: CC BY 4.0

Affiliation: Shandong Normal University

Abstract

When attempting to analyze and understand large-scale video datasets, choosing training videos using active learning can significantly reduce the annotation costs associated with supervised learning without sacrificing the accuracy of classifiers. However, to further reduce the computational overhead of exhaustive comparisons between low-level visual feature with high dimensions, we have developed a novel video hashing coding model based on an active learning framework. The model optimizes mean average precision in a straightforward way by explicitly considering the structure information in video clips when learning the optimal hash functions. The structure information considered includes the temporal consistency between successive frames and the local visual patterns shared by videos with same semantic labels. Rather than relying on the similarity between paired videos, we use a ranking-based loss function to directly optimize mean average precision. Then, combined with the active learning component, we jointly evaluate the unlabeled training videos according to the uncertainty and average precision values with an efficient algorithm based on structured SVM. Extensive experimental results over several benchmark datasets demonstrate that our approach produces significantly higher search accuracy than traditional query refinement schemes.

Highlights

Due to the popularity of digital cameras and the emergence of social media websites, such as YouTube, Facebook and Youku, the amount of videos that can be accessed by everyone has increased exponentially over the past decade
We summarize the joint active video hashing learning with structure information in Algorithm 1 for details
We report performance curves in terms of mean Average Precision of different methods in Table 1 and Table 2

Summary

Introduction

Due to the popularity of digital cameras and the emergence of social media websites, such as YouTube, Facebook and Youku, the amount of videos that can be accessed by everyone has increased exponentially over the past decade. Facing such proliferation of videos, how to efficiently analyze and understand the contents of videos in large scale has become a significant challenge in the field of computer vision [1]–[3]. In machine learning, training videos with high quality annotations is definitely beneficial to better performance It is usually tedious and time consuming to label videos in large scale; especially, the accurate and reliable annotations for some specific area should be given by some experts possessed of the corresponding background knowledge. These issues severely limit the application of supervised learning in practice, where a large amount of videos are collected without labels.

Methods

Results

Conclusion