Abstract
AbstractIn real-world applications, robustness against noise is crucial for small-footprint keyword spotting (KWS) systems which are deployed on resource-limited devices. To improve the noise robustness, a reasonable approach is employing a speech enhancement model to enhance the noisy speeches first. However, current enhancement models need a lot of parameters and computation, which do not satisfy the small-footprint requirement. In this paper, we design a lightweight enhancement model, which consists of the convolutional layers for feature extracting, recurrent layers for temporal modeling and deconvolutional layers for feature recovering. To reduce the mismatch between the enhanced features and KWS system desired ones, we further propose an efficient joint training framework, in which the enhancement model and KWS system are concatenated and jointly fine-tuned through a trainable feature transformation block. With the joint training, linguistic information can back-propagate from the KWS system to the enhancement model and guide its training. Our experimental results show that the proposed small-footprint enhancement model significantly improves the noise robustness of KWS systems without much increasing model or computation complexity. Moreover, the recognition performance can be further improved through the proposed joint training framework.KeywordsSmall footprintRobust KWSSpeech enhancement
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.