ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.

Cong Han,Nima Mesgarani

doi:10.1109/icassp49357.2023.10095695

Abstract

Binaural speech separation in real-world scenarios often involves moving speakers. Most current speech separation methods use utterance-level permutation invariant training (u-PIT) for training. In inference time, however, the order of outputs can be inconsistent over time particularly in long-form speech separation. This situation which is referred to as the speaker swap problem is even more problematic when speakers constantly move in space and therefore poses a challenge for consistent placement of speakers in output channels. Here, we describe a real-time binaural speech separation model based on a Wavesplit network to mitigate the speaker swap problem for moving speaker separation. Our model computes a speaker embedding for each speaker at each time frame from the mixed audio, aggregates embeddings using online clustering, and uses cluster centroids as speaker profiles to track each speaker throughout the long duration. Experimental results on reverberant, long-form moving multitalker speech separation show that the proposed method is less prone to speaker swap and achieves comparable performance with u-PIT based models with ground truth tracking in both separation accuracy and preserving the interaural cues.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)

Lead the way for us

Journal: Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)	Publication Date: Jun 4, 2023
Citations: 1

Similar Papers

Single-Channel Speech Separation Integrating Pitch Information Based on a Multi Task Learning Framework
Xiang Li ... Xihong Wu
-
Xiang Li, et. al.Xiang Li ... Xihong Wu
01 May 2020
01 May 2020

Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation
Cunhang Fan ... Ye Bai
-
Cunhang Fan, et. al.Cunhang Fan ... Ye Bai
01 Nov 2018
01 Nov 2018

Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation
Lu Huang ... Yi Yang
-
Lu Huang, et. al.Lu Huang ... Yi Yang
01 Nov 2019
01 Nov 2019

Real-Time Binaural Speech Separation with Preserved Spatial Cues
Cong Han ... Nima Mesgarani
-
Cong Han, et. al.Cong Han ... Nima Mesgarani
01 May 2020
01 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP (Conference)