Large vocabulary speech recognition of Slovenian language using morphological models

M Maucec,Z Kacic,B Horvat,T Rotovnik

doi:10.1109/eurcon.2003.1248172

Abstract

This paper concerns the development of an automatic speech recognition system for the Slovenian language. The large number of unique words in inflected languages is identified as the primary reason for performance degradation. This article discusses statistical language models. A novel variation of the n-gram modelling theme is examined. Modelling units are chosen to be stems and endings instead of words. Only data-driven algorithms are employed to decompose words into stems and endings automatically. Significant reduction of OOV rate results when using stems and endings for modeling the Slovenian language. We also discuss corpus-based topic-adapted language models. Language models are most often used in a homogeneous topic environment. The problem of topic detection in highly inflected language is outlined, caused by the appearance of several word forms derived from the same lemma. The problem is solved by using data-driven algorithms to group words of the same lemma into classes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Large vocabulary speech recognition of Slovenian language using morphological models

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Modelling Highly Inflected Slovenian Language
Mirjam Sepesy Maučec
International Journal of Speech Technology | VOL. 6
Mirjam Sepesy MaučecMirjam Sepesy Maučec
01 Jan 2003
International Journal of Speech Technology | VOL. 6

Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla
Juan David Celis Nuñez ... Rodrigo Andres Llanos Castro
Ingeniería | VOL. 22
Juan David Celis Nuñez, et. al.Juan David Celis Nuñez ... Rodrigo Andres Llanos Castro
12 Sep 2017
Ingeniería | VOL. 22

Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
Kavya Manohar ... Rajeev Rajan
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023
Kavya Manohar, et. al.Kavya Manohar ... Rajeev Rajan
04 Nov 2023
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2023

Error Detection in Highly Inflectional Languages
Naveen Sankaran ... C.V Jawahar
-
Naveen Sankaran, et. al.Naveen Sankaran ... C.V Jawahar
01 Aug 2013
01 Aug 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large vocabulary speech recognition of Slovenian language using morphological models

Abstract

Talk to us

Similar Papers