Minimum Mean-square Error Log-spectral Amplitude Research Articles

In real-world scenarios, dynamic ambient noise often degrades speech quality, highlighting the need for advanced speech enhancement techniques. Traditional methods, which rely on static embeddings as auxiliary features, struggle to address the complexities of varying noise conditions. To overcome this, we propose a Dual-stream Noise and Speech Information Perception (DNSIP) approach that dynamically detects and processes both noise and speech through innovative information extraction and suppression mechanisms. Initially, non-speech segments predominantly contain environmental noise, while speech segments carry information about the intended speaker. To handle this dynamic nature, real-time voice activity detection (VAD) is employed to accurately differentiate between speech and noise components. Building on VAD estimates, we propose an innovative information extraction framework that selectively extracts relevant noise and speech features from the noisy input, establishing a dual-stream network for concurrent noise and speech learning. To account for the temporal and spectral variability of noise and speech, a frequency-sequence attention mechanism is integrated, enhancing the model’s ability to learn contextual and spectral dependencies. Additionally, an information suppression module is introduced to minimize cross-stream interference by attenuating noise within the speech stream and suppressing speech content within the noise stream. The derived noise and speech spectrograms are then utilized to formulate a minimum mean square error log-spectral amplitude (MMSE-LSA) estimator for robust speech enhancement. Experimental evaluations on the WSJ0 and VCTK+DEMAND datasets demonstrate that our DNSIP approach surpasses existing state-of-the-art methods, underscoring its efficacy in challenging acoustic environments.

Read full abstract

The present study is concerned with the blind source separation (BSS) of speech and speech-shaped noise sources. All recordings were carried out in an anechoic chamber using a dummy head (two microphones, one in each ear). The program which implements the algorithm for BSS of convolutive mixtures introduced by Parra and Spence [Parra, L., Spence, C., 2000a. Convolutive blind source separation of non-stationary sources. IEEE Trans. Speech Audio Process. 8(3), 320–327 (US Patent US6167417)] was used to separate out the signals. In the postprocessing phase two different denoising algorithms were used. The first was based on a minimum mean-square error log-spectral amplitude estimator [Ephraim, E., Malah, D., 1985. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Speech Audio Process. ASSP-33(2), 443–445], while the second one was based on Wiener filter in which the concept of an a priori signal-to-noise estimation presented by Ephraim (as mentioned above) was applied [Scalart, P., Filho, J.V., 1996. Speech enhancement based on a priori signal to noise estimation. IEEE Internat. Conf. Acoust. Speech Signal Process. 1, 629–632]. Non-sense word tests were used as a target speech in both cases while one or two disturbing sources were used as interferences. The speech intelligibility before and after the BSS was measured for three subjects with audiologically normal hearing. Next the speech signal after BSS was denoised and presented to the same listeners. The results revealed some ambiguities caused by the insufficient number of microphones compared to the number of sound sources. For one disturbance only, the intelligibility improvement was significant. However, when there were two disturbances and the target speech, the separation was much poorer. The additional denoising, as could be expected, raises the intelligibility slightly. Although the BSS method requires more research on optimization, the results of the investigation imply that it may be applied to hearing aids in the future.

Read full abstract

Minimum Mean-square Error Log-spectral Amplitude Research Articles

Articles published on Minimum Mean-square Error Log-spectral Amplitude

Dual-stream Noise and Speech Information Perception based Speech Enhancement

Improved Non-Negative Matrix Factorization-Based Noise Reduction of Leakage Acoustic Signals.

Microphone array speech enhancement based on optimized IMCRA

On pre-image iterations for speech enhancement.

Speech Processing System Using a Noise Reduction Neural Network Based on FFT Spectrums

Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition

Speech intelligibility improvement using convolutive blind source separation assisted by denoising algorithms

Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions

Incorporating a Psychoacoustical Model in Frequency Domain Speech Enhancement

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Minimum Mean-square Error Log-spectral Amplitude Research Articles

Articles published on Minimum Mean-square Error Log-spectral Amplitude

Dual-stream Noise and Speech Information Perception based Speech Enhancement

Improved Non-Negative Matrix Factorization-Based Noise Reduction of Leakage Acoustic Signals.

Microphone array speech enhancement based on optimized IMCRA

On pre-image iterations for speech enhancement.

Speech Processing System Using a Noise Reduction Neural Network Based on FFT Spectrums

Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition

Speech intelligibility improvement using convolutive blind source separation assisted by denoising algorithms

Compensating Acoustic Mismatch Using Class-Based Histogram Equalization for Robust Speech Recognition

Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions

Incorporating a Psychoacoustical Model in Frequency Domain Speech Enhancement