Self-Supervised Multi-Label Transformation Prediction for Video Representation Learning

Maregu Assefa,Melese Ayalew,Wei Jiang,Mohammed Seid,Getinet Yilma,Bulbula Kumeda

doi:10.1142/s0218126622501596

Abstract

Self-supervised learning is a promising paradigm to address the problem of manual-annotation through effectively leveraging unlabeled videos. By solving self-supervised pretext tasks, powerful video representations can be discovered automatically. However, recent pretext tasks for videos rely on utilizing the temporal properties of videos, ignoring the crucial supervisory signals from the spatial subspace of videos. Therefore, we present a new self-supervised pretext task called Multi-Label Transformation Prediction (MLTP) to sufficiently utilize the spatiotemporal information in videos. In MLTP, all videos are jointly transformed by a set of geometric and color-space transformations, such as rotation, cropping, and color-channel split. We formulate the pretext as a multi-label prediction task. The 3D-CNN is trained to predict a composition of underlying transformations as multiple outputs. Thereby, transformation invariant video features can be learned in a self-supervised manner. Experimental results verify that 3D-CNNs pre-trained using MLTP yield video representations with improved generalization performance for action recognition downstream tasks on UCF101 ([Formula: see text]) and HMDB51 ([Formula: see text]) datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Self-Supervised Multi-Label Transformation Prediction for Video Representation Learning

Abstract

Talk to us

Similar Papers

More From: Journal of Circuits, Systems and Computers

Lead the way for us

Journal: Journal of Circuits, Systems and Computers	Publication Date: Feb 18, 2022
Citations: 5

Similar Papers

Attentive spatial-temporal contrastive learning for self-supervised video representation
Xingming Yang ... Zhao Xie
Image and Vision Computing | VOL. 137
Xingming Yang, et. al.Xingming Yang ... Zhao Xie
11 Jul 2023
Image and Vision Computing | VOL. 137

Volumetric white matter tract segmentation with nested self-supervised learning using sequential pretext tasks.
Qi Lu ... Yuxing Li
Medical Image Analysis | VOL. 72
Qi Lu, et. al.Qi Lu ... Yuxing Li
30 Apr 2021
Medical Image Analysis | VOL. 72

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting
Ayan Kumar Bhunia ... Pinaki Nath Chowdhury
-
Ayan Kumar Bhunia, et. al.Ayan Kumar Bhunia ... Pinaki Nath Chowdhury
01 Jun 2021
01 Jun 2021

Self-supervised Video Representation Learning by Context and Motion Decoupling
Lianghua Huang ... Yu Liu
-
Lianghua Huang, et. al.Lianghua Huang ... Yu Liu
01 Jun 2021
01 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Supervised Multi-Label Transformation Prediction for Video Representation Learning

Abstract

Talk to us

Similar Papers

More From: Journal of Circuits, Systems and Computers