Variants of LSTM cells for single-channel speaker-conditioned target speaker extraction

Ragini Sinha,Christian Rollwage,Simon Doclo

doi:10.1186/s13636-024-00384-0

Abstract

Speaker-conditioned target speaker extraction aims at estimating the target speaker from a mixture of speakers utilizing auxiliary information about the target speaker. In this paper, we consider a single-channel target speaker extraction system consisting of a speaker embedder network and a speaker separator network. Instead of using standard long short-term memory (LSTM) cells in the separator network, we propose two variants of LSTM cells that are customized for speaker-conditioned target speaker extraction. The first variant customizes both the forget gate and input gate of the LSTM cell, aiming at retaining only relevant features related to target speaker and disregarding the interfering speakers by simultaneously resetting and updating the cell state using the speaker embedding. For the second variant, we introduce a new gate within the LSTM cell, referred to as auxiliary-modulation gate. This gate modulates the information processing during cell state reset, aiming at learning the long-term and short-term discriminative features of the target speaker. Both in unidirectional and bidirectional mode, experimental results on 2-speaker mixtures, 3-speaker mixtures, and noisy mixtures (containing 1, 2, or 3 speakers) show that both proposed variants of LSTM cells outperform the standard LSTM cells for target speaker extraction, where the best performance is obtained using the auxiliary-gated LSTM cells.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Variants of LSTM cells for single-channel speaker-conditioned target speaker extraction

Abstract

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Dec 2, 2024
License type: CC BY 4.0

Similar Papers

Time Matters: Time-Aware LSTMs for Predictive Business Process Monitoring
An Nguyen ... Leo Schwinn
-
An Nguyen, et. al.An Nguyen ... Leo Schwinn
01 Jan 2020
01 Jan 2020

Comparative analysis of Gated Recurrent Units (GRU), long Short-Term memory (LSTM) cells, autoregressive Integrated moving average (ARIMA), seasonal autoregressive Integrated moving average (SARIMA) for forecasting COVID-19 trends
K.E Arunkumar ... Timothy M Brenza
Alexandria Engineering Journal | VOL. 61
K.E Arunkumar, et. al.K.E Arunkumar ... Timothy M Brenza
06 Jan 2022
Alexandria Engineering Journal | VOL. 61

Image segmentation in marine environments using convolutional LSTM for temporal context
Kasper Foss Hansen ... Yuanchang Liu
Applied Ocean Research | VOL. 139
Kasper Foss Hansen, et. al.Kasper Foss Hansen ... Yuanchang Liu
26 Aug 2023
Applied Ocean Research | VOL. 139

LSTM-XL: Attention Enhanced Long-Term Memory for LSTM Cells
Tamás Grósz ... Mikko Kurimo
-
Tamás Grósz, et. al.Tamás Grósz ... Mikko Kurimo
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Variants of LSTM cells for single-channel speaker-conditioned target speaker extraction

Abstract

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing