Abstract

Language model (LM) is essential for speech recognition systems. Efficiency of this model depends on its adaptation to the linguistic characteristics. According to this, adaptation methods attempt to use syntactic and semantic features for language modelling. The previous adaptation methods such as family of Dirichlet class language model (DCLM) exploit class of history words. These methods due to lake of syntactic information are not appropriate for high morphology languages such as Farsi. This paper presents an overview for using syntactic information such as part-of-speech (POS) in DCLM for combining with a factored language model (FLM). In our proposed idea, word clustering is based on POS of previous words and history words. Different LMs are experimentally evaluated using the BijanKhan corpus. The experiments indicate that use of POS information along with history words and class of history words improves FLM, and reduces the perplexity on our corpus. Moreover, LMs are evaluated using the Farsdat corpus in hidden Markov model based on automatic speech recognition (ASR) system. Exploiting POS information along with DCLM achieved relative gain of word error rate of the ASR system by 1.2% over the DCLM.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.