Abstract
The traditional graphic user interface in healthcare-oriented consumer electronics faced challenges such as high operational complexity, time-consuming operations, and a high risk of infection. The adoption of voice user interface (VUI) could promote network automation with enhanced efficiency, reduced simplicity and operating expense in various applications. Given noisy operational environments, speech enhancement acts as an indispensable component for VUIs towards consumer devices. Recently, attention mechanism is studied for speech enhancement and exhibits promising potential. In this paper, we propose a novel and effective attention module for speech enhancement, called neural-free attention (NFA), which is a lightweight and plug-and-play module that enables the backbone network to capture the energy distribution information of speech signals along frequency-wise channels. Particularly, NFA adopts a learnable Gaussian function to perform the excitation operation and produce the attention weights for each frequency channel. The NFA is comprehensively evaluated as part of the residual temporal convolution network (ResTCN) backbone network on two commonly used training targets. Experimental results show NFA substantially improves the ResTCN backbone in speech quality and intelligibility, with extremely low parameter overhead. Also, the ResTCN+NFA shows superiority over several recent baseline models, indicating the strong potential for VUIs toward consumer devices.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.