Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Dezhao Luo,Yucan Zhou,Weiping Wang,Dayan Wu,Bo Fang,Yu Zhou

doi:10.1145/3473342

Abstract

Existing video self-supervised learning methods mainly rely on trimmed videos for model training. They apply their methods and verify the effectiveness on trimmed video datasets including UCF101 and Kinetics-400, among others. However, trimmed datasets are manually annotated from untrimmed videos. In this sense, these methods are not truly unsupervised. In this article, we propose a novel self-supervised method, referred to as Exploring Relations in Untrimmed Videos (ERUV), which can be straightforwardly applied to untrimmed videos (real unlabeled) to learn spatio-temporal features. ERUV first generates single-shot videos by shot change detection. After that, some designed sampling strategies are used to model relations for video clips. The strategies are saved as our self-supervision signals. Finally, the network learns representations by predicting the category of relations between the video clips. ERUV is able to compare the differences and similarities of video clips, which is also an essential procedure for video-related tasks. We validate our learned models with action recognition, video retrieval, and action similarity labeling tasks with four kinds of 3D convolutional neural networks. Experimental results show that ERUV is able to learn richer representations with untrimmed videos, and it outperforms state-of-the-art self-supervised methods with significant margins.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications

Lead the way for us

Journal: ACM Transactions on Multimedia Computing, Communications, and Applications	Publication Date: Jan 25, 2022
Citations: 15

Similar Papers

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Dezhao Luo ... Chang Liu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Dezhao Luo, et. al.Dezhao Luo ... Chang Liu
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework
Li Tao ... Xueting Wang
-
Li Tao, et. al.Li Tao ... Xueting Wang
12 Oct 2020
12 Oct 2020

Inter-Intra Cross-Modality Self-Supervised Video Representation Learning by Contrastive Clustering
Jiutong Wei ... Guan Luo
-
Jiutong Wei, et. al.Jiutong Wei ... Guan Luo
21 Aug 2022
21 Aug 2022

Motion-guided spatiotemporal multitask feature discrimination for self-supervised video representation learning
Shuai Bi ... Zhe Sun
Pattern Recognition | VOL. 155
Shuai Bi, et. al.Shuai Bi ... Zhe Sun
24 Jun 2024
Pattern Recognition | VOL. 155

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring Relations in Untrimmed Videos for Self-Supervised Learning

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Multimedia Computing, Communications, and Applications