Dynamic language modeling for European Portuguese

Ciro Martins,António Teixeira,João Neto

doi:10.1016/j.csl.2010.02.003

Abstract

This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dynamic language modeling for European Portuguese

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Feb 19, 2010
Citations: 25

Similar Papers

MediAlbertina: An European Portuguese medical language model
Miguel Nunes ... Luis B Elvas
Computers in Biology and Medicine | VOL. 182
Miguel Nunes, et. al.Miguel Nunes ... Luis B Elvas
02 Oct 2024
Computers in Biology and Medicine | VOL. 182

Handwriting recognition in historical documents using very large vocabularies
Volkmar Frinken ... Andreas Fischer
-
Volkmar Frinken, et. al.Volkmar Frinken ... Andreas Fischer
24 Aug 2013
24 Aug 2013

Dynamic language modeling for a daily broadcast news transcription system
Ciro Martins ... Antonio Teixeira
-
Ciro Martins, et. al.Ciro Martins ... Antonio Teixeira
01 Jan 2007
01 Jan 2007

Vocabulary selection for a broadcast news transcription system using a morpho-syntactic approach
Ciro Martins ... João Neto
-
Ciro Martins, et. al.Ciro Martins ... João Neto
27 Aug 2007
27 Aug 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic language modeling for European Portuguese

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language