Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization

Sami Keronen,Jort F Gemmeke,Heikki Kallasjoki,Kalle J Palomäki,Guy J Brown

doi:10.1186/s13634-015-0259-1

Abstract

This paper describes a novel two-stage dereverberation feature enhancement method for noise-robust automatic speech recognition. In the first stage, an estimate of the dereverberated speech is generated by matching the distribution of the observed reverberant speech to that of clean speech, in a decorrelated transformation domain that has a long temporal context in order to address the effects of reverberation. The second stage uses this dereverberated signal as an initial estimate within a non-negative matrix factorization framework, which jointly estimates a sparse representation of the clean speech signal and an estimate of the convolutional distortion. The proposed feature enhancement method, when used in conjunction with automatic speech recognizer back-end processing, is shown to improve the recognition performance compared to three other state-of-the-art techniques.

Highlights

Automatic speech recognition (ASR) is becoming an effective and versatile way to interact with modern machine interfaces
Previous studies have attempted to counteract the convolutional distortion caused by reverberation using a number of denoising methods, such as frequency domain linear prediction [3], modulation filtered spectrograms [4], or missing-data mask estimation designed for dereverberation [5]
While xcould be used directly as input for a speech recognition system, in existing work on negative matrix factorization (NMF)-based source separation for speech in additive noise [13], better performance was obtained by using the same Wiener-filtering approach we have described for the distribution matching (DM)-based initialization

Summary

Introduction

Automatic speech recognition (ASR) is becoming an effective and versatile way to interact with modern machine interfaces. For instance in [2], it was shown that even with state-of-the-art DNN systems, Previous studies have attempted to counteract the convolutional distortion caused by reverberation using a number of denoising methods, such as frequency domain linear prediction [3], modulation filtered spectrograms [4], or missing-data mask estimation designed for dereverberation [5]. All of these approaches make weak assumptions about the reverberant data (e.g., they do not require that the room impulse response is known) but they achieve only a moderate increase in ASR performance. In conditions with relatively long reverberation times, REMOS provides higher recognition accuracy than a matched model

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Aug 20, 2015
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

Pure kernel graph fusion tensor subspace clustering under non-negative matrix factorization framework
Shuai Zhao ... Zhen Tan
Information Processing and Management | VOL. 61
Shuai Zhao, et. al.Shuai Zhao ... Zhen Tan
30 Nov 2023
Information Processing and Management | VOL. 61

Learning Inter- and Intra-manifolds for Matrix Factorization-based Multi-Aspect Data Clustering
Khanh Luong ... Richi Nayak
IEEE Transactions on Knowledge and Data Engineering | VOL. -
Khanh Luong, et. al.Khanh Luong ... Richi Nayak
01 Jan 2020
IEEE Transactions on Knowledge and Data Engineering | VOL. -

Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition
Ozlem Kalinli ... Alex Acero
-
Ozlem Kalinli, et. al.Ozlem Kalinli ... Alex Acero
01 Apr 2009
01 Apr 2009

Improved self-paced learning framework for nonnegative matrix factorization
Xiangxiang Zhu ... Zhuosheng Zhang
Pattern Recognition Letters | VOL. 97
Xiangxiang Zhu, et. al.Xiangxiang Zhu ... Zhuosheng Zhang
17 Jun 2017
Pattern Recognition Letters | VOL. 97

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing