Cross-view motion consistent self-supervised video inter-intra contrastive for action representation understanding

Shuai Bi,Zhengping Hu,Hehao Zhang,Jirui Di,Zhe Sun

doi:10.1016/j.neunet.2024.106578

Abstract

Self-supervised contrastive learning draws on power representational models to acquire generic semantic features from unlabeled data, and the key to training such models lies in how accurately to track motion features. Previous video contrastive learning methods have extensively used spatially or temporally augmentation as similar instances, resulting in models that are more likely to learn static backgrounds than motion features. To alleviate the background shortcuts, in this paper, we propose a cross-view motion consistent (CVMC) self-supervised video inter-intra contrastive model to focus on the learning of local details and long-term temporal relationships. Specifically, we first extract the dynamic features of consecutive video snippets and then align these features based on multi-view motion consistency. Meanwhile, we compare the optimized dynamic features for instance comparison of different videos and local spatial fine-grained with temporal order in the same video, respectively. Ultimately, the joint optimization of spatio-temporal alignment and motion discrimination effectively fills the challenges of the missing components of instance recognition, spatial compactness, and temporal perception in self-supervised learning. Experimental results show that our proposed self-supervised model can effectively learn visual representation information and achieve highly competitive performance compared to other state-of-the-art methods in both action recognition and video retrieval tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cross-view motion consistent self-supervised video inter-intra contrastive for action representation understanding

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Similar Papers

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Dezhao Luo ... Dongbao Yang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Dezhao Luo, et. al.Dezhao Luo ... Dongbao Yang
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Spatiotemporal self-supervised predictive learning for atmospheric variable prediction via multi-group multi-attention
Zhensheng Shi ... Junyu Dong
Knowledge-Based Systems | VOL. 300
Zhensheng Shi, et. al.Zhensheng Shi ... Junyu Dong
13 Jun 2024
Knowledge-Based Systems | VOL. 300

A Novel Solution for EEG-based Emotion Recognition
Zhuofan Xie ... Haixin Sun
-
Zhuofan Xie, et. al.Zhuofan Xie ... Haixin Sun
13 Oct 2021
13 Oct 2021

Research on Self-Supervised Comparative Learning for Computer Vision
Yuanyuan Liu ... Qianqian Liu
Journal of Electronic Research and Application | VOL. 5
Yuanyuan Liu, et. al.Yuanyuan Liu ... Qianqian Liu
17 Aug 2021
Journal of Electronic Research and Application | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-view motion consistent self-supervised video inter-intra contrastive for action representation understanding

Abstract

Talk to us

Similar Papers

More From: Neural Networks