Reverberant speech recognition exploiting clarity index estimation

Pablo Peso Parada,Toon Van Waterschoot,Dushyant Sharma,Patrick A Naylor

doi:10.1186/s13634-015-0237-7

Abstract

We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index (C 50). Our best performing method includes the estimated value of C 50 in the ASR feature vector and also uses C 50 to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C 50 estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.

Highlights

Automatic speech recognition (ASR) is increasingly being used as a tool for a wide range of applications in diverse acoustic conditions
A reverberant sound is created in enclosed spaces by reflections from surfaces which create a multipath sound propagation from the source to the receiver. This effect varies with the acoustic properties of the room and the source-receiver distance, and it is characterized by the room impulse response (RIR)
The ASR evaluation tool is based on the hidden Markov model tool kit (HTK) provided by the REVERB Challenge

Summary

Introduction

Automatic speech recognition (ASR) is increasingly being used as a tool for a wide range of applications in diverse acoustic conditions (e.g. health care transcriptions, automatic translation, voicemail-to-text, and voice interface for command and control). Distant speech recognition is essential for natural and comfortable human-machine voice interfaces such as used in, for example, the automotive sector and smartphone mobile applications. A reverberant sound is created in enclosed spaces by reflections from surfaces which create a multipath sound propagation from the source to the receiver. This effect varies with the acoustic properties of the room and the source-receiver distance, and it is characterized by the room impulse response (RIR).

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Jul 1, 2015
Citations: 26	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Reverberant speech recognition exploiting clarity index estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

Inversion-based nonlinear adaptation of noisy acoustic parameters for a neural/HMM speech recognizer
Edmondo Trentin ... Marco Gori
Neurocomputing | VOL. 70
Edmondo Trentin, et. al.Edmondo Trentin ... Marco Gori
27 Jun 2006
Neurocomputing | VOL. 70

Subband Temporal Envelope Features and Data Augmentation for End-to-end Recognition of Distant Conversational Speech
Cong-Thanh Do
-
Cong-Thanh DoCong-Thanh Do
01 May 2019
01 May 2019

Two-stage lexicon optimization of G2P-converted pronunciation dictionary based on statistical acoustic confusability measure
Nam Kyun Kim ... Hong Kook Kim
-
Nam Kyun Kim, et. al.Nam Kyun Kim ... Hong Kook Kim
01 Dec 2015
01 Dec 2015

Robust combination of neural networks and hidden markov models for speech recognition
E Trentin ... M Gori
IEEE Transactions on Neural Networks | VOL. 14
E Trentin, et. al.E Trentin ... M Gori
01 Nov 2003
IEEE Transactions on Neural Networks | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reverberant speech recognition exploiting clarity index estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing