Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning

Jingran Zhang,Fumin Shen,Huimin Lu,Heng Tao Shen,Xing Xu,Xin Liu

doi:10.1609/aaai.v35i4.16447

Abstract

The recent success of audio-visual representations learning can be largely attributed to their pervasive concurrency property, which can be used as a self-supervision signal and extract correlation information. While most recent works focus on capturing the shared associations between the audio and visual modalities, they rarely consider multiple audio and video pairs at once and pay little attention to exploiting the valuable information within each modality. To tackle this problem, we propose a novel audio-visual representation learning method dubbed self-supervised curriculum learning (SSCL) under the teacher-student learning manner. Specifically, taking advantage of contrastive learning, a two-stage scheme is exploited, which transfers the cross-modal information between teacher and student model as a phased process. The proposed SSCL approach regards the pervasive property of audiovisual concurrency as latent supervision and mutually distills the structure knowledge of visual to audio data. Notably, the SSCL method can learn discriminative audio and visual representations for various downstream applications. Extensive experiments conducted on both action video recognition and audio sound recognition tasks show the remarkably improved performance of the SSCL method compared with the state-of-the-art self-supervised audio-visual representation learning methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 10

Similar Papers

Benchmarking Self-Supervised Contrastive Learning Methods for Image-Based Plant Phenotyping.
Franklin C Ogidi ... Ian Stavness
Plant phenomics (Washington, D.C.) | VOL. 5
Franklin C Ogidi, et. al.Franklin C Ogidi ... Ian Stavness
01 Jan 2023
Plant phenomics (Washington, D.C.) | VOL. 5

CS-CO: A Hybrid Self-Supervised Visual Representation Learning Method for H&E-stained Histopathological Images.
Pengshuai Yang ... Rui Jiang
Medical Image Analysis | VOL. 81
Pengshuai Yang, et. al.Pengshuai Yang ... Rui Jiang
01 Oct 2022
Medical Image Analysis | VOL. 81

Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey.
Longlong Jing ... Yingli Tian
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 43
Longlong Jing, et. al.Longlong Jing ... Yingli Tian
04 May 2020
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 43

In-Distribution and Out-of-Distribution Self-Supervised ECG Representation Learning for Arrhythmia Detection.
Sahar Soltanieh ... Javad Hashemi
IEEE Journal of Biomedical and Health Informatics | VOL. 28
Sahar Soltanieh, et. al.Sahar Soltanieh ... Javad Hashemi
01 Feb 2024
IEEE Journal of Biomedical and Health Informatics | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Audio-Visual Association with Self-Supervised Curriculum Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence