Abstract

The acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and the far-end signal. Usually, a post-processing module is required to further suppress the echo. In this paper, we propose a residual echo suppression method based on the modification of dual-path recurrent neural network (DPRNN) to improve the quality of speech communication. Both the residual signal and the auxiliary signal, the far-end signal or the output of the adaptive filter, obtained from the linear acoustic echo cancelation are adopted to form a dual-stream for the DPRNN. We validate the efficacy of the proposed method in the notoriously difficult double-talk situations and discuss the impact of different auxiliary signals on performance. We also compare the performance of the time domain and the time-frequency domain processing. Furthermore, we propose an efficient and applicable way to deploy our method to off-the-shelf loudspeakers by fine-tuning the pre-trained model with little recorded-echo data.

Highlights

  • The acoustic echo is generated from the coupling between the loudspeaker and the microphone in full-duplex handsfree telecommunication systems or smart speakers

  • Chen et al EURASIP Journal on Audio, Speech, and Music Processing (2021) 2021:35 the fully convolutional time-domain audio separation network (Conv-TasNet) [18], we proposed a residual echo suppression (RES) method based on the multi-stream Conv-TasNet, where both the residual signal of the linear acoustic echo cancelation (LAEC) system and the output of the adaptive filter are adopted to form multiple streams [19]

  • 4.1 Performance comparison We compare the proposed methods with some typical deep neural network (DNN)-based RES methods to validate the efficiency of our model

Read more

Summary

Introduction

The acoustic echo is generated from the coupling between the loudspeaker and the microphone in full-duplex handsfree telecommunication systems or smart speakers. Typical linear acoustic echo cancelation (LAEC) methods use adaptive algorithms to identify the impulse response between the loudspeaker and the microphone [1]. Timedomain least mean square (LMS) algorithms [2, 3] are often employed in delay-sensitive situations. Frequencydomain LMS algorithms are often utilized to guarantee both fast convergence speed and low computational load [2]. The frequency-domain adaptive Kalman filter (FDKF) [4] is a commonly used method with several efficient variations proposed recently [5, 6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.