Inter-Intra Cross-Modality Self-Supervised Video Representation Learning by Contrastive Clustering

Jiutong Wei,Guan Luo,Weiming Hu,Bing Li

doi:10.1109/icpr56361.2022.9956697

Abstract

This paper introduces an online self-supervised method that leverages inter- and intra-level variance for video representation learning. Most existing methods tend to focus on instance-level or inter-variance encoding but ignore the intra-variance existing in clips. The key observation to solving this problem is the underlying correlation between visual and audio, in which the distribution of flow patterns in feature space is diverse, but expresses complementary similar semantics. And in the semantic feature space, the horizontal dimension of the feature matrix could be regarded as cluster labels. These cluster labels should be consistent for different modalities of the same video clip. Based on this idea, we propose an end-to-end inter-intra cross-modality contrastive clustering scheme to simultaneously optimize the inter- and intra-level contrastive loss. Experiments show that our proposed approach is able to considerably outperform previous methods for self-supervised learning on HMDB51 and UCF101 when applied to video retrieval and action recognition tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Inter-Intra Cross-Modality Self-Supervised Video Representation Learning by Contrastive Clustering

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning
Dezhao Luo ... Dongbao Yang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Dezhao Luo, et. al.Dezhao Luo ... Dongbao Yang
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Neural network design using Voronoi diagrams
N.K Bose ... A.K Garga
IEEE Transactions on Neural Networks | VOL. 4
N.K Bose, et. al.N.K Bose ... A.K Garga
01 Jan 1992
IEEE Transactions on Neural Networks | VOL. 4

Exploring Relations in Untrimmed Videos for Self-Supervised Learning
Dezhao Luo ... Bo Fang
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 18
Dezhao Luo, et. al.Dezhao Luo ... Bo Fang
25 Jan 2022
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 18

A Novel Perspective to Zero-Shot Learning: Towards an Alignment of Manifold Structures via Semantic Feature Expansion
Jingcai Guo ... Song Guo
IEEE Transactions on Multimedia | VOL. 23
Jingcai Guo, et. al.Jingcai Guo ... Song Guo
03 Apr 2020
IEEE Transactions on Multimedia | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inter-Intra Cross-Modality Self-Supervised Video Representation Learning by Contrastive Clustering

Abstract

Talk to us

Similar Papers