Abstract

This paper proposes a small-footprint wake-up-word (WUW) recognition system for real noisy environments by employing the competing-words-based feature. Competing-words-based features are generated using a ResNet-based deep neural network with small parameters using the competing-words dataset. The competing-words dataset consists of the most acoustically similar and dissimilar words to the WUW used for our system. The obtained features are used as input to the classification network, which is developed using the convolutional neural network (CNN) model. To obtain sufficient data for training, data augmentation is performed by using a room impulse response filter and adding sound signals of various television shows as background noise, which simulates an actual living room environment. The experimental results demonstrate that the proposed WUW recognition system outperforms the baselines that employ CNN and ResNet models. The proposed system shows 1.31% in equal error rate and 1.40% false rejection rate at a 1.0% false alarm rate, which are 29.57% and 50.00% relative improvements compared to the ResNet system, respectively. The number of parameters used for the proposed system is reduced by 83.53% compared to the ResNet system. These results prove that the proposed system with the competing-words-based feature is highly effective at improving WUW recognition performance in noisy environments with a smaller footprint.

Highlights

  • As speech recognition systems use large amount of resources, to minimize computational load, many systems employ wake-up-word (WUW) recognition so that they can be awakened to an active mode once WUW is recognized

  • To obtain sufficient data for training, data augmentation is performed by using a room impulse response filter and adding sound signals of various television shows as background noise, which simulates an actual living room environment

  • We proposed a small-footprint WUW recognition system for noisy environments by employing the competing-words-based feature

Read more

Summary

Introduction

As speech recognition systems use large amount of resources, to minimize computational load, many systems employ wake-up-word (WUW) recognition so that they can be awakened to an active mode once WUW is recognized. Many studies on small-footprint keyword spotting have shown effectiveness by employing different types of deep networks, including convolutional neural networks (CNN) [4], convolution. We propose utilizing competing words in order to improve WUW recognition performance and minimize the model size of the system. A high-level feature was generated using the competing-words dataset and the residual network. The competing-words-based feature was used as an input to the CNN-based network for classification network training. For small-footprint systems, we focused on minimizing the size of the model parameters as well as increasing the recognition accuracy.

Proposed WUW Recognition System
Selection of Competing Words
Generation of Competing-Words-Based Feature
Configurations of thegeneration
Classification Network
Analysis of the Competing-Words Network-Based Feature
Distribution contour curves of two-dimensional vector through
Database
Experimental Results
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.