State-of-the-Art Analysis of Deep Learning-Based Monaural Speech Source Separation Techniques

Swati Soni,Lalita Gupta,Ram Narayan Yadav

doi:10.1109/access.2023.3235010

Swati Soni, Lalita Gupta + Show 1 more

Open Access

https://doi.org/10.1109/access.2023.3235010

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2023
Citations: 7	License type: CC BY 4.0

Affiliation: Maulana Azad National Institute of Technology

Abstract

The monaural speech source separation problem is an important application in the signal processing field. But recent interaction of deep learning algorithms with signal processing achieves remarkable performance improvement for speech source separation problems. This paper explores the numerous state-of-the-art deep learning-based monaural speech source separation algorithms in the time-frequency (T-F), time, and hybrid domains. The motivation, algorithm, and framework of different deep learning models for monaural speech source separation are analyzed. The benchmarked algorithms in the T-F domain can be categorized as deep neural networks (DNN), clustering, permutation, multi-task learning, computational auditory sense analysis (CASA), and phase reconstruction-based techniques, whereas the state-of-the-art time-domain approaches can be categorized as CNN, RNN, multi-scale fusion (MSF), and transformer-based techniques. The end-to-end post filter (E2EPF) is a hybrid algorithm combining T-F and time-domain works to achieve enhanced results. Time-domain models have shown improvement in separation performance compared to the T-F and hybrid domain models with small model sizes. Methods in T-F, time, and hybrid domains are compared using <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$SDR$ </tex-math></inline-formula> , <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$SI-SDR$ </tex-math></inline-formula> , <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$SI-SNR$ </tex-math></inline-formula> , PESQ, and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$STOI$ </tex-math></inline-formula> as quality assessment metrics on some benchmark datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

State-of-the-Art Analysis of Deep Learning-Based Monaural Speech Source Separation Techniques

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Deep Multi-channel Speech Source Separation with Time-frequency Masking for Spatially Filtered Microphone Input Signal
Masahito Togami
-
Masahito TogamiMasahito Togami
24 Jan 2021
24 Jan 2021

Stochastic Online Dictionary Learning for Speech Source Localization and Separation in Spherical Harmonic Domain
Vishnuvardhan Varanasi ... Rajesh Hegde
-
Vishnuvardhan Varanasi, et. al.Vishnuvardhan Varanasi ... Rajesh Hegde
01 Apr 2018
01 Apr 2018

A robust physiology-based source separation method for QRS detection in low amplitude fetal ECG recordings
R Vullings ... P F F Wijn
Physiological Measurement | VOL. 31
R Vullings, et. al.R Vullings ... P F F Wijn
07 Jun 2010
Physiological Measurement | VOL. 31

A nonlinear frequency‐domain beamformer for underdetermined speech mixtures
Michael Davies ... Mohammad Dmour
The Journal of the Acoustical Society of America | VOL. 123
Michael Davies, et. al.Michael Davies ... Mohammad Dmour
01 May 2008
The Journal of the Acoustical Society of America | VOL. 123

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

State-of-the-Art Analysis of Deep Learning-Based Monaural Speech Source Separation Techniques

Abstract

Talk to us

Similar Papers

More From: IEEE Access