Abstract

In this work, we formulate the problem of keyword spotting as a non-uniform error automatic speech recognition (ASR) problem and propose a model training methodology based on the non-uniform minimum classification error (MCE) approach. The main idea is to adapt the fundamental criteria to reflect the cost-sensitive notion in that errors on keywords are much more significant than errors on non-keywords in an automatic speech recognition task. The notion of cost sensitivity leads to emphasis of keyword models in parameter optimization. Then we present a system which takes advantage of the weighted finite-state transducer (WFST) framework to efficiently implement the non-uniform MCE. To enhance the approach of non-uniform error cost minimization for keyword spotting, we further formulate a technique called adaptive boosted non-uniform MCE which incorporates the idea of boosting. We validate the proposed framework on two challenging large-scale spontaneous conversational telephone speech (CTS) datasets in two different languages (English and Mandarin). Experimental results show our framework can achieve consistent and significant spotting performance gains over both the maximum likelihood estimation (MLE) baseline and conventional discriminatively-trained systems with uniform error cost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.