SpeeD's DNN approach to Romanian speech recognition

Alexandru-Lucian Georgescu,Corneliu Burileanu,Horia Cucu

doi:10.1109/sped.2017.7990443

Abstract

This paper presents the main improvements brought recently to the large-vocabulary, continuous speech recognition (LVCSR) system for Romanian language developed by the Speech and Dialogue (SpeeD) research laboratory. While the most important improvement consists in the use of DNN-based acoustic models, instead of the classic HMM-GMM approach, several other aspects are discussed in the paper: a significant increase of the speech training corpus, the use of additional algorithms for feature processing, speaker adaptive training, and discriminative training and, finally, the use of lattice rescoring with significantly expanded language models (n-gram models up to order 5, based on vocabularies of up to 200k words). The ASR experiments were performed with several types of acoustic and language models in different configurations on the standard read and conversational speech corpora created by SpeeD in 2014. The results show that the extension of the training speech corpus leads to a relative word error rate (WER) improvement between 15% and 17%, while the use of DNN-based acoustic models instead of HMM-GMM-based acoustic models leads to a relative WER improvement between 18% and 23%, depending on the nature of the evaluation speech corpus (read or conversational, clean or noisy). The best configuration of the LVCSR system was integrated as a live transcription web application available online on SpeeD laboratory's website at https://speed.pub.ro/live-transcriber-2017.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

SpeeD's DNN approach to Romanian speech recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Integrating different acoustic and syntactic language models in a continuous speech recognition system
Amparo Varona ... In Torres
-
Amparo Varona, et. al.Amparo Varona ... In Torres
16 Oct 2000
16 Oct 2000

Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages
Ebru Arsoy ... Mikko Kurimo
-
Ebru Arsoy, et. al.Ebru Arsoy ... Mikko Kurimo
01 Nov 2008
01 Nov 2008

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic
Martha Yifiru Tachbelie ... Laurent Besacier
Speech Communication | VOL. 56
Martha Yifiru Tachbelie, et. al.Martha Yifiru Tachbelie ... Laurent Besacier
14 Feb 2013
Speech Communication | VOL. 56

Deep hierarchical bottleneck MRASTA features for LVCSR
Zoltan Tuske ... Ralf Schluter
-
Zoltan Tuske, et. al.Zoltan Tuske ... Ralf Schluter
01 May 2013
01 May 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SpeeD's DNN approach to Romanian speech recognition

Abstract

Talk to us

Similar Papers