Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data

Hsin-Min Wang Hsin-Min Wang,Yen-Ju Yang Yen-Ju Yang,Jia-Lin Shen Jia-Lin Shen,Lin-Shan Lee Lin-Shan Lee,Chiu-Yu Tseng Chiu-Yu Tseng

doi:10.1109/icassp.1995.479273

Abstract

This paper presents the first known results for complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but very limited training data. Although some isolated-syllable-based or isolated-word-based large-vocabulary Mandarin speech recognition systems have been successfully developed, a continuous-speech-based system of this kind has never been reported before. For successful development of this system, several important techniques have been used, including acoustic modeling of a set of sub-syllabic models for base syllable recognition and another set of context-dependent models for tone recognition, a multiple candidate searching technique based on a concatenated syllable matching algorithm to synchronize base syllable and tone recognition, and a word-class-based Chinese language model for linguistic decoding. The best recognition accuracy achieved is 88.69% for finally decoded Chinese characters, with 88.69%, 91.57%, and 81.37% accuracy for base syllables, tones, and tonal syllables respectively.

Full Text