Getting more from automatic transcripts for semi-supervised language modeling

Scott Novotney,Richard Schwartz,Sanjeev Khudanpur

doi:10.1016/j.csl.2015.08.007

Abstract

Many under-resourced languages such as Arabic diglossia or Hindi sub-dialects do not have sufficient in-domain text to build strong language models for use with automatic speech recognition (ASR). Semi-supervised language modeling uses a speech-to-text system to produce automatic transcripts from a large amount of in-domain audio typically to augment a small amount of manual transcripts. In contrast to the success of semi-supervised acoustic modeling, conventional language modeling techniques have provided only modest gains. This paper first explains the limitations of back-off language models due to their dependence on long-span n-grams, which are difficult to accurately estimate from automatic transcripts. From this analysis, we motivate a more robust use of the automatic counts as a prior over the estimated parameters of a log-linear language model. We demonstrate consistent gains for semi-supervised language models across a range of low-resource conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Getting more from automatic transcripts for semi-supervised language modeling

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Sep 3, 2015
Citations: 4

Similar Papers

Training RNN language models on uncertain ASR hypotheses in limited data scenarios
Imran Sheikh ... Irina Illina
Computer Speech & Language | VOL. 83
Imran Sheikh, et. al.Imran Sheikh ... Irina Illina
20 Aug 2023
Computer Speech & Language | VOL. 83

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic
Martha Yifiru Tachbelie ... Laurent Besacier
Speech Communication | VOL. 56
Martha Yifiru Tachbelie, et. al.Martha Yifiru Tachbelie ... Laurent Besacier
14 Feb 2013
Speech Communication | VOL. 56

The state of the art in language modeling
Joshua Goodman
-
Joshua GoodmanJoshua Goodman
01 Jan 2003
01 Jan 2003

Multi-objective optimization for semi-supervised discriminative language modeling
Akio Kobayashi ... Toru Imai
-
Akio Kobayashi, et. al.Akio Kobayashi ... Toru Imai
01 Mar 2012
01 Mar 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Getting more from automatic transcripts for semi-supervised language modeling

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language