Abstract

Keyword Spotting (KWS) has been the subject of research in recent years given the increase of embedded systems for command recognition such as Alexa, Google Home, and Siri. Performance, model size, processing time, and robustness to noise are fundamental in these systems. Furthermore, applications in embedded systems demand computationally efficient models that can be implemented in current technology. In this work, an approach for keyword recognition is evaluated using three deep learning models namely LeNet-5, SqueezeNet, and EfficientNet-B0. We evaluate transfer learning, pruning and quantization strategies in training and test using noisy and clean speech signals. In addition, compression techniques such as pruning and quantization were assessed in terms of the size reduction of the model footprint and the accuracy obtained in each case. Using the Google’s Speech Commands dataset and additive babble noise signal, our keyword recognition approach achieves an accuracy of 94.6% using an unstructured pruning of 80% of the parameters of the original SqueezeNet network with a reduction of 70% in the model size.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call