Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural Networks

Yang Sun,Syed Mohsen Naqvi,Jonathon Chambers,Wenwu Wang

doi:10.1109/taslp.2018.2874708

Abstract

Deep neural networks DNNs have been used for dereverberation and separation in the monaural source separation problem. However, the performance of current state-of-the-art methods is limited, particularly when applied in highly reverberant room environments. In this paper, we propose a two-stage approach with two DNN-based methods to address this problem. In the first stage, the dereverberation of the speech mixture is achieved with the proposed dereverberation mask DM. In the second stage, the dereverberant speech mixture is separated with the ideal ratio mask IRM. To realize this two-stage approach, in the first DNN-based method, the DM is integrated with the IRM to generate the enhanced time-frequency T-F mask, namely the ideal enhanced mask IEM, as the training target for the single DNN. In the second DNN-based method, the DM and the IRM are predicted with two individual DNNs. The IEEE and the TIMIT corpora with real room impulse responses and noise from the NOISEX dataset are used to generate speech mixtures for evaluations. The proposed methods outperform the state-of-the-art specifically in highly reverberant room environments.

Highlights

S OURCE separation aims to separate the desired speech signals from the mixture, which consists of the speech sources, the background interference and their reflections
Compared with the ideal ratio mask (IRM)- and the complex IRM (cIRM)-based Deep neural networks (DNNs) methods, both our proposed methods provide improved performance in terms of the SNRfw and source to distortion ratio (SDR) consistently
When the room impulse responses (RIRs) are unseen, the generalization ability of the proposed method is evaluated, the results shown in Figures 7, 8, 11 & 12 and Tables VI & VIII confirm that the proposed method can better separate the desired speech signal from mixture than the IRM- and cIRM-based methods

Summary

Introduction

S OURCE separation aims to separate the desired speech signals from the mixture, which consists of the speech sources, the background interference and their reflections. Due to applications such as automatic speech recognition (ASR), assisted living systems and hearing aids [1]–[6], source separation in real-world scenarios has attracted considerable research attention. The source separation problem is categorized into multichannel, stereo-channel (binaural) and single-channel (monaural). Only one recording is available, and the spatial information cannot generally be extracted. In real-world room environments, the reverberations are challenging, which distort the received mixture and degrade the separation performance [7]. Chambers is with the Department of Engineering, University of Leicester, Leicester LE1 7RU, U.K.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2019
Citations: 27	License type: other-oa

R Discovery Prime

R Discovery Prime

Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

Enhanced Time-Frequency Masking by Using Neural Networks for Monaural Source Separation in Reverberant Room Environments
Yang Sun ... Jonathon A Chambers
-
Yang Sun, et. al.Yang Sun ... Jonathon A Chambers
01 Sep 2018
01 Sep 2018

Sequentially Trained DNNs Based Monaural Source Separation in Real Room Environments
Yi Li ... Syed Mohsen Naqvi
-
Yi Li, et. al.Yi Li ... Syed Mohsen Naqvi
01 May 2019
01 May 2019

Monaural Source Separation Based on Sequentially Trained LSTMs in Real Room Environments
Yi Li ... Yang Sun
-
Yi Li, et. al.Yi Li ... Yang Sun
01 Sep 2019
01 Sep 2019

A Study on Online Source Extraction in the Presence of Changing Speaker Positions
Jens Heitkaemper ... Reinhold Haeb-Umbach
-
Jens Heitkaemper, et. al.Jens Heitkaemper ... Reinhold Haeb-Umbach
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing