RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing

Efthymios Tzinis,Buye Xu,Yossi Adi,Anurag Kumar,Vamsi K Ithapu,Paris Smaragdis

doi:10.1109/jstsp.2022.3200911

Abstract

We present <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">RemixIT</i> , a simple yet effective self-supervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform. Our approach overcomes limitations of previous methods which make them dependent on clean in-domain target signals and thus, sensitive to any domain mismatch between train and test samples. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">RemixIT</i> is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures. Then, by permuting the estimated clean and noise signals and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">remixing</i> them together, we generate a new set of bootstrapped mixtures and corresponding pseudo-targets which are used to train the student network. Vice-versa, the teacher periodically refines its estimates using the updated parameters of the latest student models. Experimental results on multiple speech enhancement datasets and tasks not only show the superiority of our method over prior approaches but also showcase that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">RemixIT</i> can be combined with any separation model as well as be applied towards any semi-supervised and unsupervised domain adaptation task. Our analysis, paired with empirical evidence, sheds light on the inside functioning of our self-training scheme wherein the student model keeps obtaining better performance while observing severely degraded pseudo-targets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal of Selected Topics in Signal Processing	Publication Date: Oct 1, 2022
Citations: 21	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing

Lead the way for us

Similar Papers

Continual Self-Training With Bootstrapped Remixing For Speech Enhancement
Efthymios Tzinis ... Anurag Kumar
-
Efthymios Tzinis, et. al.Efthymios Tzinis ... Anurag Kumar
23 May 2022
23 May 2022

Speech enhancement based on a modified spectral subtraction method
Md T Islam ... C Shahnaz
-
Md T Islam, et. al.Md T Islam ... C Shahnaz
01 Aug 2014
01 Aug 2014

Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks
Chang-Le Liu ... Jen-Wei Huang
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Chang-Le Liu, et. al.Chang-Le Liu ... Jen-Wei Huang
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

Noise Reduction Based Random Matrix Theory
X Lu ... T Shimizu
-
X Lu, et. al.X Lu ... T Shimizu
01 Dec 2008
01 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing