Recent improvements of the SpeeD Romanian LVCSR system

Horia Cucu,Dragos Burileanu,Andi Buzo,Lucian Petrica,Corneliu Burileanu

doi:10.1109/iccomm.2014.6866659

Abstract

Abstract —This paper presents the main improvements brought recently to the SpeeD automatic speech recognition system. Several aspects, such as speech and text resources acquisition, noise-robust speech features and feature transforms are discussed. All the updates in our ASR system are accompanied by experimental results illustrating significant improvements: between 30% and 35% relative WER reductions for various case studies (read/spontaneous speech, noisy/clean speech). In the last part of the paper, our ASR system is also compared with Google’s ASR system and a brief analysis of the results is presented. Keywords -ASR; LVCSR; noise-robust speech recognition; PNCC; LDA. I. I NTRODUCTION Large vocabulary continuous speech recognition (LVCSR) is still an unsolved topic for many languages. The reasons for this are (i) there is a lack of acoustic and linguistic resources needed for development (it is the case of so-called under-resourced languages) and (ii) the scientific research community is not stimulated by any national or international evaluation campaigns (as opposed to languages such as English, French or Chinese). The Romanian language is affected by both the aforementioned problems. In this context, the development of speech and language resources for automatic speech recognition (ASR) is a critical issue that must be addressed to push forward the research in this direction and create LVCSR systems comparable to those available for other languages. This is one of the main goals of the Speech and Dialogue (SpeeD) research group

Full Text