Efficient integration of automated speech recognition in the framework of dialogue-based vocal systems

Alex Trutnev

doi:10.5075/epfl-thesis-3163

Abstract

In this work, we propose different strategies for efficiently integrating an automated speech recognition module in the framework of a dialogue-based vocal system. The aim is the study of different ways leading to the improvement of the quality and robustness of the recognition. We first concentrate on the choice of the type of acoustic models that should be used for the speech recognition. Our goal is to evaluate the hypothesis that hybrid acoustic models, in which estimation of frame-based phoneme probabilities is made through artificial neural networks, provide performance results similar to the classical Hidden-Markov models using Multi-Gaussian estimations, while being more robust in generalization across tasks. We experimentally show that, due to the size of the parameter space to be explored, it is not always practically possible to achieve a performance comparable to the one of Multi-Gaussian models, and that in fact hybrid models often lead to worse recognition performance. In a second part, we focus on one of the main limitations of state-of-the-art speech recognition: the inadequacy of the one-best approach to yield a hypothesis corresponding to the right transcription. For that, we explore the solution consisting in producing, during acoustic decoding, a word lattice containing a very large number of hypotheses, that is then filtered by a syntactic analyzer using more sophisticated syntactic models, such as stochastic context-free grammars. The goal of this approach is to yield syntactically correct hypotheses for further processing. More precisely, we study the approach proach consisting in dynamically tuning the relative importance of the acoustic and language models, resulting in the increase of the lexical and syntacticonsisting variability in the word lattice. We identify and experimentally quantify two important drawbacks for this approach: its high computational cost and the impossibility to guarantee that, in practice, the correct solution is indeed present in the lattice. Finally, we study the problem of the inadequacy of the use of generic linguistic resources (language models and phonetic lexica) to yield robust and efficient recognition results. In this context, we explore the solution consisting in the integration of dynamic phonetic and language models controlled by an associated dialogue model. In this approach, restricted lexicon and language models dependent on the context of the dialogue are used in place of the complete ones. We first experimentally verify that this approach indeed yields a significant increase in speech recognition performance, and we then focus on the problem of producing, for a given application, the adequate dialogue model that can efficiently integrate the speech recognition module. In this perspective, we propose an enhancement of the used dialogue model prototyping methodology by integrating speech recognition error simulation within the Wizard-of-Oz dialogue simulation. We show that such an approach enables a more complete prototyping of the dialogue model that guarantees a better adequacy of the resulting dialogue model to the targeted vocal application.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient integration of automated speech recognition in the framework of dialogue-based vocal systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language – Amharic
Martha Yifiru Tachbelie ... Laurent Besacier
Speech Communication | VOL. 56
Martha Yifiru Tachbelie, et. al.Martha Yifiru Tachbelie ... Laurent Besacier
14 Feb 2013
Speech Communication | VOL. 56

Exploring recurrent neural network based acoustic and linguistic modeling for children's speech recognition
Sreeram Ganji ... Rohit Sinha
-
Sreeram Ganji, et. al.Sreeram Ganji ... Rohit Sinha
01 Nov 2017
01 Nov 2017

Acoustic and language models adaptation for Indonesian spontaneous speech recognition
Dessi Puji Lestari ... Angela Irfani
-
Dessi Puji Lestari, et. al.Dessi Puji Lestari ... Angela Irfani
01 Aug 2015
01 Aug 2015

Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla
Juan David Celis Nuñez ... Byron Medina Delgado
Ingeniería | VOL. 22
Juan David Celis Nuñez, et. al.Juan David Celis Nuñez ... Byron Medina Delgado
12 Sep 2017
Ingeniería | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient integration of automated speech recognition in the framework of dialogue-based vocal systems

Abstract

Talk to us

Similar Papers