Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Zhiwu Qing,Nong Sang,Mingqian Tang,Ziyuan Huang,Shiwei Zhang,Rong Jin,Xiang Wang,Changxin Gao,Yi Xu

doi:10.1109/cvpr52688.2022.01345

Abstract

Natural videos provide rich visual contents for selfsupervised learning. Yet most existing approaches for learning spatio-temporal representations rely on manually trimmed videos, leading to limited diversity in visual patterns and limited performance gain. In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos. To this end, we propose to learn a hierarchy of consistencies in videos, i.e., visual consistency and topical consistency, corresponding respectively to clip pairs that tend to be visually similar when separated by a short time span and share similar topics when separated by a long time span. Specifically, a hierarchical consistency learning framework HiCo is presented, where the visually consistent pairs are encouraged to have the same representation through contrastive learning, while the topically consistent pairs are coupled through a topical classifier that distinguishes whether they are topicrelated. Further, we impose a gradual sampling algorithm for proposed hierarchical consistency learning, and demonstrate its theoretical superiority. Empirically, we show that not only HiCo can generate stronger representations on untrimmed videos, it also improves the representation quality when applied to trimmed videos. This is in contrast to standard contrastive learning that fails to learn appropriate representations from untrimmed videos.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Self-Supervised Learning from Untrimmed Videos via Hierarchical Consistency.
Zhiwu Qing ... Yi Xu
IEEE transactions on pattern analysis and machine intelligence | VOL. 45
Zhiwu Qing, et. al.Zhiwu Qing ... Yi Xu
01 Jan 2023
IEEE transactions on pattern analysis and machine intelligence | VOL. 45

Exploring Relations in Untrimmed Videos for Self-Supervised Learning
Dezhao Luo ... Yu Zhou
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 18
Dezhao Luo, et. al.Dezhao Luo ... Yu Zhou
25 Jan 2022
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 18

Spatio-Temporal Action Detection in Untrimmed Videos by Using Multimodal Features and Region Proposals.
Yeongtaek Song ... Incheol Kim
Sensors | VOL. 19
Yeongtaek Song, et. al.Yeongtaek Song ... Incheol Kim
03 Mar 2019
Sensors | VOL. 19

PDAN: Pyramid Dilated Attention Network for Action Detection
Rui Dai ... Lorenzo Garattoni
-
Rui Dai, et. al.Rui Dai ... Lorenzo Garattoni
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Abstract

Talk to us

Similar Papers