Reverberation Power Spectral Density Research Articles

Many speech dereverberation techniques require an estimate of the late reverberation power spectral density (PSD). State-of-the-art multichannel methods for estimating the late reverberation PSD typically rely on first, an estimate of the relative transfer functions (RTFs) of the target signal; second, a model for the spatial coherence matrix of the late reverberation; and finally, an estimate of the reverberant speech or reverberant and noisy speech PSD matrix. The RTFs, the spatial coherence matrix, and the speech PSD matrix are all prone to modeling and estimation errors in practice, with the RTFs being particularly difficult to estimate accurately, especially in highly reverberant and noisy scenarios. Recently, we proposed an eigenvalue decomposition (EVD)-based late reverberation PSD estimator, which does not require an estimate of the RTFs. In this paper, this EVD-based PSD estimator is further analyzed and its estimation accuracy and computational complexity are analytically compared to a state-of-the-art maximum likelihood (ML) based PSD estimator. It is shown that for perfect knowledge of the RTFs, spatial coherence matrix, and reverberant speech PSD matrix, the ML-based and the EVD-based PSD estimates are both equal to the true late reverberation PSD. In addition, it is shown that for erroneous RTFs but perfect knowledge of the spatial coherence matrix and reverberant speech PSD matrix, the ML-based PSD estimate is larger than or equal to the true late reverberation PSD, whereas the EVD-based PSD estimate is obviously still equal to the true late reverberation PSD. Finally, it is shown that when modeling and estimation errors occur in all quantities, the ML-based PSD estimate is larger than or equal to the EVD-based PSD estimate. Simulation results for several realistic acoustic scenarios demonstrate the advantages of using the EVD-based PSD estimator in a multichannel Wiener filter, yielding a significantly better performance than the ML-based PSD estimator.

Read full abstract

This paper presents extended techniques aiming at the improvement of automatic speech recognition (ASR) in single-channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. The focus is laid on the development and analysis of ASR front-end technologies covering speech enhancement and feature extraction. Speech enhancement is performed using a joint noise reduction and dereverberation system in the spectral domain based on estimates of the noise and late reverberation power spectral densities (PSDs). To obtain reliable estimates of the PSDs—even in acoustic conditions with positive direct-to-reverberation energy ratios (DRRs)—we adopt the statistical model of the room impulse response explicitly incorporating DRRs, as well in combination with a novel proposed joint estimator for the reverberation time T 60 and the DRR. The feature extraction approach is inspired by processing strategies of the auditory system, where an amplitude modulation filterbank is applied to extract the temporal modulation information. These techniques were shown to improve the REVERB baseline in our previous work. Here, we investigate if similar improvements are obtained when using a state-of-the-art ASR framework, and to what extent the results depend on the specific architecture of the back-end. Apart from conventional Gaussian mixture model (GMM)-hidden Markov model (HMM) back-ends, we consider subspace GMM (SGMM)-HMMs as well as deep neural networks in a hybrid system. The speech enhancement algorithm is found to be helpful in almost all conditions, with the exception of deep learning systems in matched training-test conditions. The auditory feature type improves the baseline for all system architectures. The relative word error rate reduction achieved by combining our front-end techniques with current back-ends is 52.7% on average with the REVERB evaluation test set compared to our original REVERB result.

Read full abstract

Reverberation Power Spectral Density Research Articles

Related Topics

Articles published on Reverberation Power Spectral Density

ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in Rank-Deficient Noise Field

Scoring-Based ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in a Spatially Homogeneous Noise Field

Analysis of Eigenvalue Decomposition-Based Late Reverberation Power Spectral Density Estimation

Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators

Cramér–Rao Bound Analysis of Reverberation Level Estimators for Dereverberation and Noise Reduction

Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Reverberation Power Spectral Density Research Articles

Related Topics

Articles published on Reverberation Power Spectral Density

ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in Rank-Deficient Noise Field

Scoring-Based ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in a Spatially Homogeneous Noise Field

Analysis of Eigenvalue Decomposition-Based Late Reverberation Power Spectral Density Estimation

Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators

Cramér–Rao Bound Analysis of Reverberation Level Estimators for Dereverberation and Noise Reduction

Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features