Abstract

Traditional approaches to keywords spotting employ a large vocabulary speech recognizer, phone recognizer or a whole-word approach such as whole-word Hidden Markov Models. In any of these approaches, considerable speech resources are required to create a word spotting system. In this paper we describe a keywords spotting system that requires about fifteen minutes of word-level transcriptions of speech as its sole annotated resource. The system uses our self-organizing speech recognizer that defines its own sound units as a recognizer for the speech in the speech domain under consideration. The transcriptions are used to train a grapheme-to-sound-unit converter. We describe this novel system and give its keyword spotting performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call