VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

Tian Pan,Wenhao Jiang,Yibing Song,Tianyu Yang,Wei Liu

doi:10.1109/cvpr46437.2021.01105

Abstract

MoCo [11] is effective for unsupervised image representation learning. In this paper, we propose VideoMoCo for unsupervised video representation learning. Given a video sequence as an input sample, we improve the temporal feature representations of MoCo from two perspectives. First, we introduce a generator to drop out several frames from this sample temporally. The discriminator is then learned to encode similar feature representations regardless of frame removals. By adaptively dropping out different frames during training iterations of adversarial learning, we augment this input sample to train a tempo-rally robust encoder. Second, we use temporal decay to model key attenuation in the memory queue when computing the contrastive loss. As the momentum encoder updates after keys enqueue, the representation ability of these keys degrades when we use the current input sample for contrastive learning. This degradation is reflected via temporal decay to attend the input sample to recent keys in the queue. As a result, we adapt MoCo to learn video representations without empirically designing pretext tasks. By empowering the temporal robustness of the encoder and modeling the temporal decay of the keys, our VideoMoCo improves MoCo temporally based on contrastive learning. Experiments on benchmark datasets including UCF101 and HMDB51 show that VideoMoCo stands as a state-of-the-art video representation learning method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Evolving Losses for Unsupervised Video Representation Learning
Aj Piergiovanni ... Michael S Ryoo
-
Aj Piergiovanni, et. al.Aj Piergiovanni ... Michael S Ryoo
01 Jun 2020
01 Jun 2020

A Shapelet-Based Framework for Unsupervised Multivariate Time Series Representation Learning
Zhiyu Liang ... Jianfeng Zhang
Proceedings of the VLDB Endowment | VOL. 17
Zhiyu Liang, et. al.Zhiyu Liang ... Jianfeng Zhang
01 Nov 2023
Proceedings of the VLDB Endowment | VOL. 17

VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN
Na Zhao ... Tat-Seng Chua
-
Na Zhao, et. al.Na Zhao ... Tat-Seng Chua
01 Jul 2017
01 Jul 2017

VideoWhisper: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks
Na Zhao ... Richang Hong
IEEE Transactions on Multimedia | VOL. 19
Na Zhao, et. al.Na Zhao ... Richang Hong
01 Sep 2017
IEEE Transactions on Multimedia | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

Abstract

Talk to us

Similar Papers