Abstract

Deep Neural Network based wake word (such as Hi Alexa or Hey Siri) systems allow increasingly accurate speech communication between humans and machines. However, this setup requires high processing power or cloud services which may not be accessible by edge devices. Currently, the accuracy of machine learning methods for cloudless edge devices in voice activation hovers below 90%. This paper explores wake word implementation on edge devices using a 2-Dimensional Convolutional Neural Network (CNN) with improved and balanced accuracy and latency. The proposed CNN model is created, trained and quantized using TensorFlow on a PC and exported to a Raspberry Pi Zero 2 W. The quantization method reduces the model size by 20% and spectral gating is adopted to lower wake word inaccuracy detection in moderately noisy environment. The proposed system achieved more than 90% wake word detection accuracy across 30 to 50 dB background noise with an average of 1.03 second of response time for the intended user. The result shows low-powered edge device still offers competitive performance for detecting wake word without cloud services.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call