Abstract

Language model (LM) plays a vital role in automatic speech recognition (ASR) systems. Since the performance of a LM is highly dependent on the lexicon, crafted lexicons are commonly used for ASR tasks. However, it is hard to construct high-quality Chinese lexicons because a great deal of time and manpower are required. In this paper, we proposed an improved lexicon generation method for mandarin speech recognition. And we applied our generated lexicon to word segmentation and ASR tasks. In word segmentation experiments, we evaluated our proposed method both on the MSR and PKU data which are provided by second Sighan bakeoff. Results show that our proposed method achieved higher F-scores than previous lexicon generation methods. When applying our generated lexicons to training LMs for ASR tasks, our proposed method further reduced character error rate (CER) on both telephone speech evaluation sets compared with previous methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.