Abstract
AbstractAcoustic echo cancellation (AEC) methods aim to suppress the acoustic coupling for hands‐free speech communication. Traditional AEC works by identifying the acoustic impulse response using adaptive algorithms. With recent research advances, deep learning has become an attractive choice for AEC. This paper introduces a two‐stage bidirectional long short term memory (TS‐BLSTM) framework, incorporating multi‐head self‐attention mechanisms after each BLSTM block. This is aimed at better capturing contextual information and further enhancing ability of the model to handle complex acoustic scenarios. The BLSTM blocks are utilized to aggregate magnitude spectrum information, modelling both time and frequency dependencies. Additionally, dilation convolution is introduced to broaden the range of information in each convolution output. The magnitude decoder estimates a mask for the input, resulting in the generation of an estimated magnitude spectrum for near‐end speech. Experimental results indicate that the proposed method achieves promising outcomes.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have