Ameliorated language modelling for lecture speech recognition of Indian English

Disha Kaur Phull,G Bharadwaja Kumar

doi:10.1007/s12046-018-0976-x

Abstract

A great amount of research is growing towards the automatic transcription of lectures that consist of numerous information and knowledge that could be helpful to the educational systems and institutes. In large vocabulary speech recognition, language model plays a paramount role in reducing the humongous search space. However, language modelling is very brittle when moving from one domain to another or when moving from read speech to spontaneous speech. Also, lecture speech recognition will have some of the characteristics of spontaneous speech. Hence, it is very challenging to build the language model for this task. In this paper, a judicious approach to adapt the language model in a way where the language model will be in close proximity to the topic spoken in the lecture speech has been depicted. The evaluation of the language model is devised using the proposed approach with the existing language models such as CMU Sphinx, Gigaword and HUB-4. We observed the results analysis that the language models devised from the proposed approach outperform from the existing language models in terms of word error rate, perplexity and out of vocabulary rate. Analysis shows that the presented two-phase approach has resulted in an average decrease of the word error rate to be approximately 14% and the perplexity is decreased by half on average.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Ameliorated language modelling for lecture speech recognition of Indian English

Abstract

Talk to us

Similar Papers

More From: Sādhanā

Lead the way for us

Similar Papers

Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla
Juan David Celis Nuñez ... Rodrigo Andres Llanos Castro
Ingeniería | VOL. 22
Juan David Celis Nuñez, et. al.Juan David Celis Nuñez ... Rodrigo Andres Llanos Castro
12 Sep 2017
Ingeniería | VOL. 22

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition.
Edvin Pakoci ... Branislav Popović
Computational Intelligence and Neuroscience | VOL. 2019
Edvin Pakoci, et. al.Edvin Pakoci ... Branislav Popović
03 Mar 2019
Computational Intelligence and Neuroscience | VOL. 2019

Construction of Language Models for Uzbek Language
N.S. Mamatov ... N.A. Niyozmatova
-
N.S. Mamatov, et. al.N.S. Mamatov ... N.A. Niyozmatova
28 Sep 2022
28 Sep 2022

Text Corpus Augmentation to Represent Filled Pause in Indonesian Spontaneous Speech Recognition System
Candra Bella Vista ... Dessi Puji Lestari
-
Candra Bella Vista, et. al.Candra Bella Vista ... Dessi Puji Lestari
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ameliorated language modelling for lecture speech recognition of Indian English

Abstract

Talk to us

Similar Papers

More From: Sādhanā