EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION.

Xilin Jiang,Cong Han,Yinghao Aaron Li,Nima Mesgarani

doi:10.1109/icassp48485.2024.10447391

Abstract

In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)

Lead the way for us

Similar Papers

A Deep Representation Learning Framework for Medical Imaging Data Analysis
Pengcheng Xi
-
Pengcheng XiPengcheng Xi
24 Jun 2020
24 Jun 2020

Cleansed PHAT GCC based sound source localization
Sangmoon Lee ... Youn-Sik Park
-
Sangmoon Lee, et. al.Sangmoon Lee ... Youn-Sik Park
01 Oct 2010
01 Oct 2010

GCC-PHAT with Speech-oriented Attention for Robotic Sound Source Localization
Jiadong Wang ... Xinyuan Qian
-
Jiadong Wang, et. al.Jiadong Wang ... Xinyuan Qian
30 May 2021
30 May 2021

FILDNE: A Framework for Incremental Learning of Dynamic Networks Embeddings
Piotr Bielak ... Nitesh V Chawla
Knowledge-Based Systems | VOL. 236
Piotr Bielak, et. al.Piotr Bielak ... Nitesh V Chawla
30 Aug 2021
Knowledge-Based Systems | VOL. 236

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EXPLORING SELF-SUPERVISED CONTRASTIVE LEARNING OF SPATIAL SOUND EVENT REPRESENTATION.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)