Self-Supervised Mutual Learning for Video Representation Learning

Jinpeng Wang,Yutong Li,Yanyu Ding,Xuebin Yang,Jianguo Hu

doi:10.1109/icme51207.2021.9428338

Jinpeng Wang, Yutong Li + Show 3 more

https://doi.org/10.1109/icme51207.2021.9428338

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This work tackles the problem of self-supervised learning of video representation tasks. The related works construct different surrogate supervision signals from data itself. Instead of proposing novel signal, our main insight is that the field of self-supervised learning can be benefited from mutual learning, that is, these supervision signals can learn from others and the combination between them leads to better representation. Unifying these two approaches, we present a frame-work called Self-supervised Mutual (SSM) Learning: a simple framework for mutual learning of video representation under the content of self-supervised. In order to understand what enables the task to learn useful representation, we systematically study the major components of our framework. We show that (1) surrogate supervision signal can learn from others effectively under the framework of mutual learning; (2) introducing a learnable align unit between the deep features supervised by multiple supervision signals in hidden space improves the quality of the learned representation. By combining these findings, we are able to considerably outperform previous methods for self-supervised learning on HMDB51 and UCF101 when applied to action recognition tasks.

Full Text