Pipelined language model construction for Polish speech recognition

Jerzy Sas,Andrzej Żołnierek

doi:10.2478/amcs-2013-0049

Abstract

Abstract The aim of works described in this article is to elaborate and experimentally evaluate a consistent method of Language Model (LM) construction for the sake of Polish speech recognition. In the proposed method we tried to take into account the features and specific problems experienced in practical applications of speech recognition in the Polish language, reach inflection, a loose word order and the tendency for short word deletion. The LM is created in five stages. Each successive stage takes the model prepared at the previous stage and modifies or extends it so as to improve its properties. At the first stage, typical methods of LM smoothing are used to create the initial model. Four most frequently used methods of LM construction are here. At the second stage the model is extended in order to take into account words indirectly co-occurring in the corpus. At the next stage, LM modifications are aimed at reduction of short word deletion errors, which occur frequently in Polish speech recognition. The fourth stage extends the model by insertion of words that were not observed in the corpus. Finally the model is modified so as to assure highly accurate recognition of very important utterances. The performance of the methods applied is tested in four language domains.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Applied Mathematics and Computer Science	Publication Date: Sep 1, 2013
Citations: 19	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Pipelined language model construction for Polish speech recognition

Abstract

Talk to us

Similar Papers

More From: International Journal of Applied Mathematics and Computer Science

Lead the way for us

Similar Papers

Construction of Language Models for Uzbek Language
N.S. Mamatov ... N.A. Niyozmatova
-
N.S. Mamatov, et. al.N.S. Mamatov ... N.A. Niyozmatova
28 Sep 2022
28 Sep 2022

Distant Co-occurrence Language Model for ASR in LooseWord Order Languages
Jerzy Sas ... Andrzej Zolnierek
-
Jerzy Sas, et. al.Jerzy Sas ... Andrzej Zolnierek
01 Jan 2010
01 Jan 2010

Statistical analysis of Polish language corpus for speech recognition application
Piotr Klosowski
-
Piotr KlosowskiPiotr Klosowski
01 Sep 2016
01 Sep 2016

Polish Language Modelling Based on Deep Learning Methods and Techniques
Piotr Klosowski
-
Piotr KlosowskiPiotr Klosowski
01 Sep 2019
01 Sep 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pipelined language model construction for Polish speech recognition

Abstract

Talk to us

Similar Papers

More From: International Journal of Applied Mathematics and Computer Science