Abstract

For continuous keyword detection, the advantage of dynamic programming (DP) matching is that it can detect any keyword without re-training the system. In previous research, higher detection accuracy was reported using 2D-RNN based DP matching than using conventional DP and embedding methods. However, 2D-RNN based DP matching has a high computational cost. In order to address this problem, we combine a convolutional neural network (CNN) and 2D-RNN based DP matching into a unified framework which, based on the kernel size and the number of CNN layers, has a polynomial order effect on reducing the computational cost. Experimental results, using Google Speech Commands Dataset and the CHiME-3 challenge's noise data, demonstrate that our proposed model improves open keyword detection performance, compared to the embedding-based baseline system, while it is nine times faster than previous 2D-RNN DP matching.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call