On Permutation Invariant Training For Speech Source Separation

Xiaoyu Liu,Jordi Pons

doi:10.1109/icassp39728.2021.9413559

Abstract

We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models. We extend two state-of-the-art PIT strategies. First, we look at the two-stage speaker separation and tracking algorithm based on frame level PIT (tPIT) and clustering, which was originally proposed for the STFT domain, and we adapt it to work with waveforms and over a learned latent space. Further, we propose an efficient clustering loss scalable to waveform models. Second, we extend a recently proposed auxiliary speaker-ID loss with a deep feature loss based on "problem agnostic speech features", to reduce the local permutation errors made by the utterance level PIT (uPIT). Our results show that the proposed extensions help reducing permutation ambiguity. However, we also note that the studied STFT-based models are more effective at reducing permutation errors than waveform-based models, a perspective overlooked in recent studies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On Permutation Invariant Training For Speech Source Separation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Single-Channel Speech Separation Using Soft-Minimum Permutation Invariant Training
Midia Yousefi ... John H.L Hansen
SSRN Electronic Journal | VOL. -
Midia Yousefi, et. al.Midia Yousefi ... John H.L Hansen
01 Jan 2021
SSRN Electronic Journal | VOL. -

Single-channel speech separation using soft-minimum permutation invariant training
Midia Yousefi ... John H.L Hansen
Speech Communication | VOL. 151
Midia Yousefi, et. al.Midia Yousefi ... John H.L Hansen
18 May 2023
Speech Communication | VOL. 151

PcIRM: Complex Ideal Ratio Masking for Speaker-Independent Monaural Source Separation with Utterance Permutation Invariant Training
Wen Zhang ... Junqiang Song
-
Wen Zhang, et. al.Wen Zhang ... Junqiang Song
01 Jul 2020
01 Jul 2020

Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training
Hassan Taherian ... Deliang Wang
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Hassan Taherian, et. al.Hassan Taherian ... Deliang Wang
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Permutation Invariant Training For Speech Source Separation

Abstract

Talk to us

Similar Papers