Time–Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks

Soumitro Chakrabarty,Emanuel A P Habets

doi:10.1109/jstsp.2019.2911401

Abstract

This paper presents a time–frequency masking based online multi-channel speech enhancement approach that uses a convolutional recurrent neural network to estimate the mask. The magnitude and phase components of the short-time Fourier transform coefficients for multiple time frames are provided as an input such that the network is able to discriminate between the directional speech and the noise components based on the spatial characteristics of the individual signals as well as their spectro-temporal structure. The estimation of two different masks, namely, ideal ratio mask (IRM) and ideal binary mask (IBM), along with two different approaches for incorporating the mask to obtain the desired signal are discussed. In the first approach, the mask is directly applied as a real valued gain to a reference microphone signal, whereas in the second approach, the masks are used as an activity indicator for the recursive update of power spectral density (PSD) matrices to be used within a beamformer. The performance of the proposed system with the two different estimated masks utilized within the two different enhancement approaches is evaluated with both simulated as well as measured room impulse responses, where it is shown that the IBM is better suited as an indicator for the PSD updates while direct application of IRM as a real valued gain leads to a better improvement in terms of short term objective intelligibility. Analysis of the performance of the proposed system also demonstrates the robustness of the system to different angular positions of the speech source.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Time–Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing

Lead the way for us

Journal: IEEE Journal of Selected Topics in Signal Processing	Publication Date: Aug 1, 2019
Citations: 89

Similar Papers

Review of Ideal Binary and Ratio Mask Estimation Techniques for Monaural Speech Separation
T M Minipriya ... R Rajavel
-
T M Minipriya, et. al.T M Minipriya ... R Rajavel
01 Feb 2018
01 Feb 2018

Improved Mask-Based Neural Beamforming for Multichannel Speech Enhancement by Snapshot Matching Masking
Ching-Hua Lee ... Hongxia Jin
-
Ching-Hua Lee, et. al.Ching-Hua Lee ... Hongxia Jin
04 Jun 2023
04 Jun 2023

Comparison of ideal mask-based speech enhancement algorithms for speech mixed with white noise at low mixture signal-to-noise ratios
Simone Graetzer ... Carl Hopkins
The Journal of the Acoustical Society of America | VOL. 152
Simone Graetzer, et. al.Simone Graetzer ... Carl Hopkins
01 Dec 2022
The Journal of the Acoustical Society of America | VOL. 152

On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis
Christopher Hummersone ... Tim Brookes
-
Christopher Hummersone, et. al.Christopher Hummersone ... Tim Brookes
01 Jan 2014
01 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Time–Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing