Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Yan-Bo Lin,Yu-Chiang Frank Wang

doi:10.1609/aaai.v35i3.16302

Abstract

Human perceives rich auditory experience with distinct sound heard by ears. Videos recorded with binaural audio particular simulate how human receives ambient sound. However, a large number of videos are with monaural audio only, which would degrade the user experience due to the lack of ambient information. To address this issue, we propose an audio spatialization framework to convert a monaural video into a binaural one exploiting the relationship across audio and visual components. By preserving the left-right consistency in both audio and visual modalities, our learning strategy can be viewed as a self-supervised learning technique, and alleviates the dependency on a large amount of video data with ground truth binaural audio data during training. Experiments on benchmark datasets confirm the effectiveness of our proposed framework in both semi-supervised and fully supervised scenarios, with ablation studies and visualization further support the use of our model for audio spatialization.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 5

Similar Papers

Tracking illegal activities using video surveillance systems: a review of the current state of research
D. O. Zhadan ... D. V. Pashniev
Law and Safety | VOL. 92
D. O. Zhadan, et. al.D. O. Zhadan ... D. V. Pashniev
29 Mar 2024
Law and Safety | VOL. 92

Detection and tracking of humans from an airborne platform
Judith Dijk ... Gertjan Burghouts
-
Judith Dijk, et. al.Judith Dijk ... Gertjan Burghouts
07 Oct 2014
07 Oct 2014

Collecting and Annotating Human Activities in Web Videos
Fabian Caba Heilbron ... Juan Carlos Niebles
-
Fabian Caba Heilbron, et. al.Fabian Caba Heilbron ... Juan Carlos Niebles
01 Apr 2014
01 Apr 2014

Reduction of bubble-like frames using a RSS filter in wireless capsule endoscopy video
Qian Wang ... Xijing Zou
Optics & Laser Technology | VOL. 110
Qian Wang, et. al.Qian Wang ... Xijing Zou
05 Sep 2018
Optics & Laser Technology | VOL. 110

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence