Abstract

With the development of mobile smart devices, keyword spotting plays an important role in the interaction between machines and users. However, low storage and low energy consumption of mobile devices limit the accuracies of keyword spotting tasks. Therefore, how to achieve a balance between the high accuracy and low consumption is a research hotspot for a keyword spotting system. Convolutional neural networks have been widely adopted in recent keyword spotting systems due to their superior accuracies, and the success of the transformer architecture in many areas demonstrates the effectiveness of self-attention. In this paper, we combine self-attention and convolutional neural networks, and propose the broadcast attention learning network (BA-net), using a small number of parameters while achieving the accuracies of 97.18% and 78.44% respectively on the Google speech command dataset and a real telephone speech dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call