Unsupervised learning from videos using temporal coherency deep networks

Carolina Redondo-Cabrera,Roberto Lopez-Sastre

doi:10.1016/j.cviu.2018.08.003

Abstract

In this work we address the challenging problem of unsupervised learning from videos. Existing methods utilize the spatio-temporal continuity in contiguous video frames as regularization for the learning process. Typically, this temporal coherence of close frames is used as a free form of annotation, encouraging the learned representations to exhibit small differences between these frames. But this type of approach fails to capture the dissimilarity between videos with different content, hence learning less discriminative features. We here propose two Siamese architectures for Convolutional Neural Networks, and their corresponding novel loss functions, to learn from unlabeled videos, which jointly exploit the local temporal coherence between contiguous frames, and a global discriminative margin used to separate representations of different videos. An extensive experimental evaluation is presented, where we validate the proposed models on various tasks. First, we show how the learned features can be used to discover actions and scenes in video collections. Second, we show the benefits of such an unsupervised learning from just unlabeled videos, which can be directly used as a prior for the supervised recognition tasks of actions and objects in images, where our results further show that our features can even surpass a traditional and heavily supervised pre-training plus fine-tuning strategy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Unsupervised learning from videos using temporal coherency deep networks

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding

Lead the way for us

Journal: Computer Vision and Image Understanding	Publication Date: Aug 30, 2018
Citations: 38

Similar Papers

Object-Centric Representation Learning from Unlabeled Videos
Ruohan Gao ... Kristen Grauman
-
Ruohan Gao, et. al.Ruohan Gao ... Kristen Grauman
01 Jan 2017
01 Jan 2017

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
Peihao Chen ... Runhao Zeng
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Peihao Chen, et. al.Peihao Chen ... Runhao Zeng
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

Local Temporal Coherence for Object-Aware Keypoint Selection in Video Sequences
Songlin Du ... Takeshi Ikenaga
-
Songlin Du, et. al.Songlin Du ... Takeshi Ikenaga
01 Jan 2018
01 Jan 2018

Unsupervised Learning in Space and Time over Several Generations of Teacher and Student Networks
Marius Leordeanu
-
Marius LeordeanuMarius Leordeanu
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised learning from videos using temporal coherency deep networks

Abstract

Talk to us

Similar Papers

More From: Computer Vision and Image Understanding