Deep learning-based stereophonic acoustic echo suppression without decorrelation.

Linjuan Cheng,Andong Li,Renhua Peng,Chengshi Zheng,Xiaodong Li

doi:10.1121/10.0005757

Abstract

Traditional stereophonic acoustic echo cancellation algorithms need to estimate acoustic echo paths from stereo loudspeakers to a microphone, which often suffers from the nonuniqueness problem caused by a high correlation between the two far-end signals of these stereo loudspeakers. Many decorrelation methods have already been proposed to mitigate this problem. However, these methods may reduce the audio quality and/or stereophonic spatial perception. This paper proposes to use a convolutional recurrent network (CRN) to suppress the stereophonic echo components by estimating a nonlinear gain, which is then multiplied by the complex spectrum of the microphone signal to obtain the estimated near-end speech without a decorrelation procedure. The CRN includes an encoder-decoder module and two-layer gated recurrent network module, which can take advantage of the feature extraction capability of the convolutional neural networks and temporal modeling capability of recurrent neural networks simultaneously. The magnitude spectra of the two far-end signals are used as input features directly without any decorrelation preprocessing and, thus, both the audio quality and stereophonic spatial perception can be maintained. The experimental results in both the simulated and real acoustic environments show that the proposed algorithm outperforms traditional algorithms such as the normalized least-mean square and Wiener algorithms, especially in situations of low signal-to-echo ratio and high reverberation time RT60.

Highlights

In practical hands-free teleconferencing systems, a stereophonic communication system is often necessary to provide a realistic experience that a single-channel system cannot offer
This paper proposes to use a convolutional recurrent network (CRN) to suppress the stereophonic echo components by estimating a nonlinear gain, which is multiplied by the complex spectrum of the microphone signal to obtain the estimated near-end speech without a decorrelation procedure
A deep learning technology is proposed to solve the SAEC problem because deep neural networks (DNNs) can efficiently model the nonlinear relationship between the high dimensional vectors, which can remove both the decorrelation procedure and double-talk detectors (DTDs) in traditional SAEC algorithms

Summary

INTRODUCTION

In practical hands-free teleconferencing systems, a stereophonic communication system is often necessary to provide a realistic experience that a single-channel system cannot offer. Traditional SAEC methods usually suppress the echo by estimating the acoustic echo paths between the stereophonic loudspeakers and microphones using adaptive filters In such a case, two echo paths need to be identified for each microphone because there are two far-end signals. A commonly used way to mitigate the nonuniqueness problem in adaptive filtering-based SAEC algorithms is to decorrelate the two far-end signals (Benesty et al, 1999). A modified SAES method incorporating the spectral and temporal correlations in the STFT domain was proposed in Lee et al (2014), which considered the adjacent time-frequency (TF) bins of far-end signals Note that these Wiener filter-based SAES algorithms can suppress the echo signal by the estimated gain function in each TF bin directly, they do not need to estimate the two echo paths of the SAEC exactly.

SIGNAL MODEL

PROPOSED CRN-BASED SAES

Feature extraction

Training targets and signal reconstruction

Model architecture

Experiment setting

16 Â T Â 80 32 Â T Â 39 64 Â T Â 19 128 Â T Â 9 256 Â T Â 4

Performance evaluation in high SNR scenarios

Performance evaluation in low SNR scenarios

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: The Journal of the Acoustical Society of America	Publication Date: Aug 1, 2021
Citations: 9	License type: cc-by

R Discovery Prime

R Discovery Prime

Deep learning-based stereophonic acoustic echo suppression without decorrelation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Similar Papers

Dilated convolutional recurrent neural network for monaural speech enhancement
Shadi Pirhosseinloo ... Jonathan S Brumberg
-
Shadi Pirhosseinloo, et. al.Shadi Pirhosseinloo ... Jonathan S Brumberg
01 Nov 2019
01 Nov 2019

HybridCBAMNet: Enhancing time series binary classification with convolutional recurrent networks and attention mechanisms
Mei-Ling Huang ... Yi-Ting Yang
Measurement | VOL. 241
Mei-Ling Huang, et. al.Mei-Ling Huang ... Yi-Ting Yang
01 Sep 2024
Measurement | VOL. 241

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Yanxin Hu ... Shubo Lv
-
Yanxin Hu, et. al.Yanxin Hu ... Shubo Lv
25 Oct 2020
25 Oct 2020

Recurrent networks can recycle neural resources to flexibly trade speed for accuracy in visual recognition
Courtney Spoerer ... Nikolaus Kriegeskorte
-
Courtney Spoerer, et. al.Courtney Spoerer ... Nikolaus Kriegeskorte
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep learning-based stereophonic acoustic echo suppression without decorrelation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America