Continuous frame motion sensitive self-supervised collaborative network for video representation learning

Shuai Bi,Zhengping Hu,Mengyao Zhao,Hehao Zhang,Jirui Di,Zhe Sun

doi:10.1016/j.aei.2023.101941

Abstract

Motion, as a feature of video that changes in temporal sequences, is crucial to visual understanding. The powerful video representation and extraction models are typically able to focus attention on motion features in challenging dynamic environments to complete more complex video understanding tasks. However, previous approaches discriminate mainly based on similar features in the spatial or temporal domain, ignoring the interdependence of consecutive video frames. In this paper, we propose the motion sensitive self-supervised collaborative network, a video representation learning framework that exploits a pretext task to assist feature comparison and strengthen the spatiotemporal discrimination power of the model. Specifically, we first propose the motion-aware module, which extracts consecutive motion features from the spatial regions by frame difference. The global–local contrastive module is then introduced, with context and enhanced video snippets being defined as appropriate positive samples for a broader feature similarity comparison. Finally, we introduce the snippet operation prediction module, which further assists contrastive learning to obtain more reliable global semantics by sensing changes in continuous frame features. Experimental results demonstrate that our work can effectively extract robust motion features and achieve competitive performance compared with other state-of-the-art self-supervised methods on downstream action recognition and video retrieval tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Continuous frame motion sensitive self-supervised collaborative network for video representation learning

Abstract

Talk to us

Similar Papers

More From: Advanced Engineering Informatics

Lead the way for us

Journal: Advanced Engineering Informatics	Publication Date: Mar 16, 2023
Citations: 5

Similar Papers

Cycle representation-disentangling network: learning to completely disentangle spatial-temporal features in video
Pengfei Sun ... Feng Chen
Applied Intelligence | VOL. 50
Pengfei Sun, et. al.Pengfei Sun ... Feng Chen
15 Jul 2020
Applied Intelligence | VOL. 50

SSAN: Separable Self-Attention Network for Video Representation Learning
Xudong Guo ... Yan Lu
-
Xudong Guo, et. al.Xudong Guo ... Yan Lu
01 Jun 2021
01 Jun 2021

Inter-Intra Cross-Modality Self-Supervised Video Representation Learning by Contrastive Clustering
Jiutong Wei ... Guan Luo
-
Jiutong Wei, et. al.Jiutong Wei ... Guan Luo
21 Aug 2022
21 Aug 2022

Study on Various Self-supervised Video Representation Learning Methods
Soohyun Park ... Jongwon Choi
Moving Image & Technology (MINT) | VOL. 2
Soohyun Park, et. al.Soohyun Park ... Jongwon Choi
31 Aug 2022
Moving Image & Technology (MINT) | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Continuous frame motion sensitive self-supervised collaborative network for video representation learning

Abstract

Talk to us

Similar Papers

More From: Advanced Engineering Informatics