Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning

Mani Kumar Tellamekala,Timo Giesbrecht,Michael Pound,Michel Valstar

doi:10.1109/icpr48806.2021.9413295

Abstract

Self-supervised learning has emerged as a candidate approach to learn semantic visual features from unlabeled video data. In self-supervised learning, intrinsic correspondences between data points are used to define a proxy task that forces the model to learn semantic representations. Most existing proxy tasks applied to video data exploit only either intra-modal (e.g. temporal) or cross-modal (e.g. audio-visual) correspondences separately. In theory, jointly learning both these correspondences may result in richer visual features; but, as we show in this work, doing so is non-trivial in practice. To address this problem, we introduce ‘Audio-Visual Permutative Predictive Coding’ (AV-PPC), a multi-task learning framework designed to fully leverage the temporal and cross-modal correspondences as natural supervision signals. In AV-PPC, the model is trained to simultaneously learn multiple intra- and cross-modal predictive coding sub-tasks. By using visual speech recognition (lip-reading) as the downstream evaluation task, we show that our proposed proxy task can learn higher quality visual features than existing proxy tasks. We also show that AV-PPC visual features are highly data-efficient. Without further finetuning, AV-PPC visual encoder achieves 80.30% spoken word classification rate on the LRW dataset, performing on par with directly supervised visual encoders that are learned from large amounts of labeled data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Self-supervised Learning: A Succinct Review.
Veenu Rani ... Krishan Kumar
Archives of Computational Methods in Engineering | VOL. 30
Veenu Rani, et. al.Veenu Rani ... Krishan Kumar
20 Jan 2023
Archives of Computational Methods in Engineering | VOL. 30

Self-supervised representation learning for detection of ACL tear injury in knee MR videos
Siladittya Manna ... Saumik Bhattacharya
Pattern Recognition Letters | VOL. 154
Siladittya Manna, et. al.Siladittya Manna ... Saumik Bhattacharya
01 Feb 2022
Pattern Recognition Letters | VOL. 154

Aggregative Self-supervised Feature Learning from Limited Medical Images
Jiuwen Zhu ... S Kevin Zhou
-
Jiuwen Zhu, et. al.Jiuwen Zhu ... S Kevin Zhou
01 Jan 2021
01 Jan 2021

SSL++: Improving Self-Supervised Learning by Mitigating the Proxy Task-Specificity Problem.
Song Chen ... Jianlong Chang
IEEE Transactions on Image Processing | VOL. 31
Song Chen, et. al.Song Chen ... Jianlong Chang
01 Jan 2021
IEEE Transactions on Image Processing | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning

Abstract

Talk to us

Similar Papers