Abstract
Speech enhancement plays an essential role in a wide range of speech processing applications. Recent studies on speech enhancement tend to investigate how to effectively capture the long-term contextual dependencies of speech signals to boost performance. However, these studies generally neglect the time-frequency (T-F) distribution information of speech spectral components, which is equally important for speech enhancement. In this paper, we propose a simple yet very effective network module, which we term the T-F attention (TFA) module, that uses two parallel attention branches, i.e., time-frame attention and frequency-channel attention, to explicitly exploit position information to generate a 2-D attention map to characterise the salient T-F speech distribution. We validate our TFA module as part of two widely used backbone networks (residual temporal convolution network and Transformer) and conduct speech enhancement with four most popular training objectives. Our extensive experiments demonstrate that our proposed TFA module consistently leads to substantial enhancement performance improvements in terms of the five most widely used objective metrics, with negligible parameter overheads. In addition, we further evaluate the efficacy of speech enhancement as a front-end for a downstream speech recognition task. Our evaluation results show that the TFA module significantly improves the robustness of the system to noisy conditions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.