Context-dependent factored language models

Gregor Donaj,Zdravko Kačič

doi:10.1186/s13636-017-0104-6

Gregor Donaj, Zdravko Kačič

Open Access

https://doi.org/10.1186/s13636-017-0104-6

Copy DOI

Abstract

The incorporation of grammatical information into speech recognition systems is often used to increase performance in morphologically rich languages. However, this introduces demands for sufficiently large training corpora and proper methods of using the additional information. In this paper, we present a method for building factored language models that use data obtained by morphosyntactic tagging. The models use only relevant factors that help to increase performance and ignore data from other factors, thus also reducing the need for large morphosyntactically tagged training corpora. Which data is relevant is determined at run-time, based on the current text segment being estimated, i.e., the context. We show that using a context-dependent model in a two-pass recognition algorithm, the overall speech recognition accuracy in a Broadcast News application improved by 1.73% relatively, while simpler models using the same data achieved only 0.07% improvement. We also present a more detailed error analysis based on lexical features, comparing first-pass and second-pass results.

Highlights

Speech recognition still performs poorly in inflectional languages compared to mainstream languages like English
6 Experimental system We evaluated the performance of the proposed language models on a large vocabulary continuous speech recognition (LVCSR) application in an inflective language, namely a Slovene Broadcast News transcription
Results show that real-time factors (RTF) increases by a factor of 2 if the vocabulary size is increased from 60K to 300K and by a factor of 3 if when using trigram models compared to bigram models

Summary

Introduction

Speech recognition still performs poorly in inflectional languages compared to mainstream languages like English. We can build FLMs with a limited number of factors in each probability estimation, which can improve recognition performance while avoiding new data sparsity problems This method makes use of grammatical properties of the target language, as the process of determining the backoff path searches for specific correlations in a given sentence structure. Sak [11] showed for the Turkish language that the improvements achieved by using FLMs rather than traditional n-gram models are greater while only limited size corpora are available for training; the improvements decreased as the corpus size increased These results can be considered relevant to speech recognition in specific domains with limited training data, where data sparsity becomes a problem for word-based models. The sentence was taken from the BNSI test set, later used in this research

Factored language models

Initial algorithm

Results

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Feb 28, 2017
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Context-dependent factored language models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Factored language model adaptation using Dirichlet class language model for speech recognition
Ali Hatami ... Babak Nasersharif
-
Ali Hatami, et. al.Ali Hatami ... Babak Nasersharif
01 May 2013
01 May 2013

Speech Recognition and In-Vehicle Telematics Devices: Potential Reductions in Driver Distraction
Marvin C Mccallum ... James L Brown
International Journal of Speech Technology | VOL. 7
Marvin C Mccallum, et. al.Marvin C Mccallum ... James L Brown
01 Jan 2004
International Journal of Speech Technology | VOL. 7

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

Research on Human-Computer Interaction Mode of Speech Recognition Based on Environment Elements of Command and Control System
Ning Li ... Xiaoqing Li
-
Ning Li, et. al.Ning Li ... Xiaoqing Li
01 Jul 2019
01 Jul 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Context-dependent factored language models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing