Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

Armin Sehr,Roland Maas,Walter Kellermann

doi:10.1109/tasl.2010.2050511

Abstract

The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in “Distant-talking continuous speech recognition based on a novel reverberation model in the feature domain” (A. Sehr , in Proc. Interspeech, 2006, pp. 769-772) for melspectral features, is extended to logarithmic melspectral (logmelspec) features in this contribution. Thus, the favorable properties of REMOS, including its high flexibility with respect to changing reverberation conditions, become available in the more competitive logmelspec domain. Based on a combined acoustic model consisting of a hidden Markov model (HMM) network and a reverberation model (RM), REMOS determines clean-speech and reverberation estimates during recognition. Therefore, in each iteration of a modified Viterbi algorithm, an inner optimization operation maximizes the joint density of the current HMM output and the RM output subject to the constraint that their combination is equal to the current reverberant observation. Since the combination operation in the logmelspec domain is nonlinear, numerical methods appear necessary for solving the constrained inner optimization problem. A novel reformulation of the constraint, which allows for an efficient solution by nonlinear optimization algorithms, is derived in this paper so that a practicable implementation of REMOS for logmelspec features becomes possible. An in-depth analysis of this REMOS implementation investigates the statistical properties of its reverberation estimates and thus derives possibilities for further improving the performance of REMOS. Connected digit recognition experiments show that the proposed REMOS version in the logmelspec domain significantly outperforms the melspec version. While the proposed RMs with parameters estimated by straightforward training for a given room are robust to a mismatch of the speaker-microphone distance, their performance significantly decreases if they are used in a room with substantially different conditions. However, by training multi-style RMs with data from several rooms, good performance can be achieved across different rooms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech, and Language Processing	Publication Date: Sep 1, 2010
Citations: 87

Similar Papers

Model-based dereverberation in the logmelspec domain for robust distant-talking speech recognition
Armin Sehr ... Walter Kellermann
-
Armin Sehr, et. al.Armin Sehr ... Walter Kellermann
01 Jan 2009
01 Jan 2009

An uncertainty decoding approach to noise- and reverberation-robust speech recognition
Roland Maas ... Akshaya Thippur
-
Roland Maas, et. al.Roland Maas ... Akshaya Thippur
01 May 2013
01 May 2013

A highly efficient optimization scheme for REMOS-based distant-talking speech recognition
...
-
, et. al. ...
23 Aug 2010
23 Aug 2010

Discrete-Mixture HMMs-based Approach for Noisy Speech Recognition
Tetsuo Kosaka ... Masaki Koh
-
Tetsuo Kosaka, et. al.Tetsuo Kosaka ... Masaki Koh
01 Jun 2007
01 Jun 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing