Modeling under-resourced languages for speech recognition

Mikko Kurimo,Seppo Enarvi,Tanel Alumäe,Matti Varjokallio,Ottokar Tilk,André Mansikkaniemi

doi:10.1007/s10579-016-9336-9

Abstract

One particular problem in large vocabulary continuous speech recognition for low-resourced languages is finding relevant training data for the statistical language models. Large amount of data is required, because models should estimate the probability for all possible word sequences. For Finnish, Estonian and the other fenno-ugric languages a special problem with the data is the huge amount of different word forms that are common in normal speech. The same problem exists also in other language technology applications such as machine translation, information retrieval, and in some extent also in other morphologically rich languages. In this paper we present methods and evaluations in four recent language modeling topics: selecting conversational data from the Internet, adapting models for foreign words, multi-domain and adapted neural network language modeling, and decoding with subword units. Our evaluations show that the same methods work in more than one language and that they scale down to smaller data resources.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modeling under-resourced languages for speech recognition

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation

Lead the way for us

Journal: Language Resources and Evaluation	Publication Date: Feb 10, 2016
Citations: 15

Similar Papers

Incorporating language constraints in sub-word based speech recognition
H Erdogan ... O Buyuk
-
H Erdogan, et. al.H Erdogan ... O Buyuk
01 Jan 2004
01 Jan 2004

Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages
Ebru Arsoy ... Tanel Alume
-
Ebru Arsoy, et. al.Ebru Arsoy ... Tanel Alume
01 Nov 2008
01 Nov 2008

Future vector enhanced LSTM language model for LVCSR
Qi Liu ... Yanmin Qian
-
Qi Liu, et. al.Qi Liu ... Yanmin Qian
01 Dec 2017
01 Dec 2017

Trends and challenges in language modeling for speech recognition and machine translation
Holger Schwenk
-
Holger SchwenkHolger Schwenk
01 Dec 2009
01 Dec 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modeling under-resourced languages for speech recognition

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation