Abstract
Keyword spotting (KWS) is a task to recognize a keyword or a particular command in a continuous audio stream, which can be effectively applied to a voice trigger system that automatically monitors and processes speech signals. This paper focuses on the problem of user-defined keyword spotting in low-resource settings. A lightweight neural network architecture is developed for tackling the keyword detection task using query-by-example (QbyE) techniques. The architecture uses a convolutional recurrent neural network (CRNN) to extract the frame-level features of input audio signals. A customized model compression method is proposed to compress the network, making it suitable for low power settings. In the keyword enrollment, all enrolled keyword examples are merged to generate a single keyword template, which is responsible for detecting a target keyword in keyword search. To improve the efficiency of keyword searching, a segmental local normalized DTW algorithm is introduced. Experiments on the real-world collected datasets show that our approach consistently outperforms the state-of-the-art methods, and the proposed system can run on an ARM Cortex-A7 processor and achieve real-time keyword detection.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.