Abstract

In telephone speech recognition, the acoustic mismatch between training and testing environments often causes a severe degradation in the recognition performance. This paper presents a keyword-driven two-level codebook-based stochastic matching (CBSM) algorithm to eliminate the acoustic mismatch. Additionally, in Mandarin speech, it is difficult to correctly recognize the unvoiced part in a syllable. In order to reduce the recognition error of unvoiced segments, a fuzzy search algorithm is proposed to extract keyword candidates from a syllable lattice. Finally, a keyword relation and a weighting function for keyword combinations are presented for multi-keyword spotting. In the multi-keyword spotting of Mandarin speech, 94 right context-dependent and 38 context-independent subsyllables are used as the basic recognition units. A corresponding anti-subsyllable model for each subsyllable is trained and used for verification. In this system, 2583 faculty names and 39 department names are selected as the primary keywords and the secondary keywords, respectively. Using a testing set of 3088 conversational speech utterances from 33 speakers (20 male, 13 female), these techniques reduced the recognition error rate from 29.6% to 20.6% for multi-keywords embedded in non-keyword speech.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.