Abstract
This paper proposes a small-footprint wake-up-word (WUW) recognition system for real noisy environments by employing the competing-words-based feature. Competing-words-based features are generated using a ResNet-based deep neural network with small parameters using the competing-words dataset. The competing-words dataset consists of the most acoustically similar and dissimilar words to the WUW used for our system. The obtained features are used as input to the classification network, which is developed using the convolutional neural network (CNN) model. To obtain sufficient data for training, data augmentation is performed by using a room impulse response filter and adding sound signals of various television shows as background noise, which simulates an actual living room environment. The experimental results demonstrate that the proposed WUW recognition system outperforms the baselines that employ CNN and ResNet models. The proposed system shows 1.31% in equal error rate and 1.40% false rejection rate at a 1.0% false alarm rate, which are 29.57% and 50.00% relative improvements compared to the ResNet system, respectively. The number of parameters used for the proposed system is reduced by 83.53% compared to the ResNet system. These results prove that the proposed system with the competing-words-based feature is highly effective at improving WUW recognition performance in noisy environments with a smaller footprint.
Highlights
As speech recognition systems use large amount of resources, to minimize computational load, many systems employ wake-up-word (WUW) recognition so that they can be awakened to an active mode once WUW is recognized
To obtain sufficient data for training, data augmentation is performed by using a room impulse response filter and adding sound signals of various television shows as background noise, which simulates an actual living room environment
We proposed a small-footprint WUW recognition system for noisy environments by employing the competing-words-based feature
Summary
As speech recognition systems use large amount of resources, to minimize computational load, many systems employ wake-up-word (WUW) recognition so that they can be awakened to an active mode once WUW is recognized. Many studies on small-footprint keyword spotting have shown effectiveness by employing different types of deep networks, including convolutional neural networks (CNN) [4], convolution. We propose utilizing competing words in order to improve WUW recognition performance and minimize the model size of the system. A high-level feature was generated using the competing-words dataset and the residual network. The competing-words-based feature was used as an input to the CNN-based network for classification network training. For small-footprint systems, we focused on minimizing the size of the model parameters as well as increasing the recognition accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.