Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation

Nasir Saleem,Muhammad Irfan Khattak

doi:10.1016/j.apacoust.2020.107385

Abstract

Speech separation is an important task of separating a target speech from the mixture signals. Speaker-independent multi-talker speech separation is a challenging task due to unpredictability of the target and interfering speech in the target-interference mixtures. Conventionally, speech separation is used as a signal processing problem, but recently it is formulated as a deep learning problem and discriminative patterns of the speech are learned from the training data. In this paper, we consider the ideal binary mask (IBM) as a supervised binary classification training-target by using fully connected deep neural networks (DNN) for single-channel speaker-independent multi-talker speech separation. The train DNNs is used to estimate IBM training-target. The mean square error (MSE) is used as an objective cost function. Standard backpropagation and Monte-Carlo dropout regularization approaches are used for better generalization and overfitting during training. The estimated training-target is applied to the mixtures to obtain the separated target speech. We have addressed the over-smoothing problem and performed equalization of spectral variances to match the estimated and clean speech features. Our experimental results in various evaluating conditions report that the proposed method outperformed the competing methods in terms of the Perceptual Evaluation of Speech Quality (PESQ), Segmental SNR (SNRSeg), Short-time objective intelligibility (STOI), normalized Frequency weighted SNRSeg (nFwSNRSeg) and HIT-FA rates.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics

Lead the way for us

Journal: Applied Acoustics	Publication Date: May 11, 2020
Citations: 9

Similar Papers

Training Supervised Speech Separation System to Improve STOI and PESQ Directly
Hui Zhang ... Xueliang Zhang
-
Hui Zhang, et. al.Hui Zhang ... Xueliang Zhang
01 Apr 2018
01 Apr 2018

Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection
Geon Woo Lee ... Hong Kook Kim
Applied Sciences | VOL. 10
Geon Woo Lee, et. al.Geon Woo Lee ... Hong Kook Kim
06 May 2020
Applied Sciences | VOL. 10

A Conditional Generative Model for Speech Enhancement
Zeng-Xi Li ... Li-Rong Dai
Circuits, Systems, and Signal Processing | VOL. 37
Zeng-Xi Li, et. al.Zeng-Xi Li ... Li-Rong Dai
13 Mar 2018
Circuits, Systems, and Signal Processing | VOL. 37

Impact of Mask Type as Training Target for Speech Intelligibility and Quality in Cochlear-Implant Noise Reduction
Fergal Henry ... Ashkan Parsi
Sensors | VOL. 24
Fergal Henry, et. al.Fergal Henry ... Ashkan Parsi
14 Oct 2024
Sensors | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation

Abstract

Talk to us

Similar Papers

More From: Applied Acoustics