Abstract

This paper presents a new search strategy for large vocabulary continuous Mandarin speech recognition considering the special structure of the Chinese language. This strategy is composed of forward and backward passes, between which a high-quality syllable lattice is generated to bridge the syllable-level and word-level decoding processes. In the forward pass, considering the small number of syllables in the Chinese language, a frame-synchronous stack decoder is used to integrate the high-order syllable N-Gram language model, so as to generate a very accurate and compact syllable lattice. In the backward pass, considering the special monosyllabic wording structure in the Chinese language, the search space for the word-level decoding is expanded dynamically from the syllable lattice, and the best word sequence is extracted based on the knowledge provided by the word pronunciation lexicon and the word N-Gram language model. In the preliminary experiments, it was found that, with this strategy, the character error rate can be reduced by more than 20% as compared with a previous system using syllable-aligned lattice approach on a speaker-adaptive continuous speech recognition task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.