An improved lexicon generation method for mandarin speech recognition

Yike Zhang,Pengyuan Zhang,Xia Jia,Qingwei Zhao,Yonghong Yan,Zhen Dong

doi:10.1109/fskd.2017.8393350

Abstract

Language model (LM) plays a vital role in automatic speech recognition (ASR) systems. Since the performance of a LM is highly dependent on the lexicon, crafted lexicons are commonly used for ASR tasks. However, it is hard to construct high-quality Chinese lexicons because a great deal of time and manpower are required. In this paper, we proposed an improved lexicon generation method for mandarin speech recognition. And we applied our generated lexicon to word segmentation and ASR tasks. In word segmentation experiments, we evaluated our proposed method both on the MSR and PKU data which are provided by second Sighan bakeoff. Results show that our proposed method achieved higher F-scores than previous lexicon generation methods. When applying our generated lexicons to training LMs for ASR tasks, our proposed method further reduced character error rate (CER) on both telephone speech evaluation sets compared with previous methods.

Full Text