Abstract

The traditional graphic user interface in healthcare-oriented consumer electronics faced challenges such as high operational complexity, time-consuming operations, and a high risk of infection. The adoption of voice user interface (VUI) could promote network automation with enhanced efficiency, reduced simplicity and operating expense in various applications. Given noisy operational environments, speech enhancement acts as an indispensable component for VUIs towards consumer devices. Recently, attention mechanism is studied for speech enhancement and exhibits promising potential. In this paper, we propose a novel and effective attention module for speech enhancement, called neural-free attention (NFA), which is a lightweight and plug-and-play module that enables the backbone network to capture the energy distribution information of speech signals along frequency-wise channels. Particularly, NFA adopts a learnable Gaussian function to perform the excitation operation and produce the attention weights for each frequency channel. The NFA is comprehensively evaluated as part of the residual temporal convolution network (ResTCN) backbone network on two commonly used training targets. Experimental results show NFA substantially improves the ResTCN backbone in speech quality and intelligibility, with extremely low parameter overhead. Also, the ResTCN+NFA shows superiority over several recent baseline models, indicating the strong potential for VUIs toward consumer devices.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call