Input Noisy Speech Research Articles

Variance based two dimensional time-frequency mask estimation for unsupervised speech enhancement is proposed to improve the speech quality and intelligibility by reducing the low-frequency residual noise distortion in the noisy speech signals. Unlike conventional speech enhancement methods, the proposed method is able to reduce the residual noise distortion by utilizing benefits of the less aggressive Wiener gain and variance based two dimensional time-frequency mask to establish a two-stage speech enhancement method. In the first stage, the less aggressive Wiener gain with modified a priori signal-to-noise (SNR) estimate is applied to the input noisy speech to obtain a reduced noise pre-processed speech signal. In the second stage, variance based features are extracted from the pre-processed speech and compared to a nonparametric adaptive threshold to construct a two dimensional time-frequency mask. The estimated mask is then applied to the pre-processed speech from the first stage to suppress the annoying residual noise distortion. A comparative performance study is included to demonstrate the effectiveness of the proposed method in various noisy conditions. The experimental results showed large improvements in terms of the perceptual evaluation of speech quality (PESQ), segmental SNR (SegSNR), residual noise distortion (BAK) and speech distortion (SIG) over that achieved with competing methods at different input SNRs. To measure the understanding of enhanced speech in different noisy conditions, short-time intelligibility prediction (STOI) is used which reinforced a better performance of the proposed method in terms of the speech intelligibility. The time-varying spectral analysis validated significant reduction of the residual noise components in the enhanced speech.

Read full abstract

잡음에 강건한 음성 인식을 위한 주파수 변이를 이용한 PMC( Parallel Model Compensation Using Frequency-variant, FV-PMC) 방법은 인식시 입력음성에 혼입이 예상되는 잡음들을 평균 주파수 변이도를 임계치로 하여 몇 가지 잡음 군으로 분류한 후 각 잡음 군 별로 인식을 수행하는 방법이다. 이 방법은 기준 임계치를 이용하여 양호하게 분류된 잡음 음성들에 대해서는 매우 우수한 성능을 보이나, 미 분류된 잡음 음성들에 대해서는 기존의 PMC 방법에서와 같이 무잡음 모델과 결합하여 음성 인식을 수행함으로 인해 평균 음성 인식률이 낮아지는 문제점이 있다. 이러한 문제점을 해결하기 위하여 본 논문에서는 기존의 방법에서 사용하였던 평균주파수 임계치 방법 대신에 최대 우도를 부가하여 미분류를 방지함으로써 입력 잡음음성에 포함되는 잡음의 군별 잡음 분류 율을 높여 인식률을 제고하는 개선된 주파수 변이 PMC 인식방법을 제안하였다. Aurora 2.0 데이터베이스를 이용한 인식실험결과, 기존의 FV-PMC 방법에 비해 향상된 결과를 확인할 수 있었다. The Parallel Model Compensation Using Frequency-variant: FV-PMC for noise-robust speech recognition is a method to classify the noises, which are expected to be intermixed with input speech when recognized, into several groups of noises by setting average frequency variant as a threshold value; and to recognize the noises depending on the classified groups. This demonstrates the excellent performance considering noisy speech categorized as good using the standard threshold value. However, it also holds a problem to decrease the average speech recognition rate with regard to unclassified noisy speech, for it conducts the process of speech recognition, combined with noiseless model as in the existing PMC. To solve this problem, this paper suggests a enhanced method of recognition to prevent the unclassified through improving the extent of rating scales with use of maximum likelihood so that the noise groups, including input noisy speech, can be classified into more specific groups, which leads to improvement of the recognition rate. The findings from recognition experiments using Aurora 2.0 database showed the improved results compared with those from the method of the previous FV-PMC.

Read full abstract

Input Noisy Speech Research Articles

Related Topics

Articles published on Input Noisy Speech

End‐to‐end speech‐denoising deep neural network based on residual‐attention gated linear units

Deep Neural Network for Supervised Single-Channel Speech Enhancement

CAQoE: A Novel No-Reference Context-aware Speech Quality Prediction Metric

End-to-End Neural Based Modification of Noisy Speech for Speech-in-Noise Intelligibility Improvement

Noise-Robust Voice Conversion Using High-Quefrency Boosting via Sub-Band Cepstrum Conversion and Fusion

Bark scaled oversampled WPT based speech recognition enhancement in noisy environments

Variance based time-frequency mask estimation for unsupervised speech enhancement

Performance improvement of monaural speech separation system using image analysis techniques

An Auto-Focusing-Noise Suppressor for Cell-Phone Videos Based on Multiple Noise References and Spectral Gain Selection

Robust Recognition of English Speech in Noisy Environments Using Frequency Warped Signal Processing

A wavelet- based transform method for quality improvement in noisy speech patterns of Arabic language

최대우도를 부가한 주파수 변이 PMC 방법의 잡음 음성 인식 성능개선

Speech enhancement based on soft audible noise masking and noise power estimation

Statistical voice activity detection based on sparse representation over learned dictionary

Dealing with noise in automatic speech recognition.

A Subtractive-Type Speech Enhancement Using the Perceptual Frequency-Weighting Function

A noise reduction technique of speech signal using ICA and spectral analysis

Dynamic Bayesian Network Inversion for Robust Speech Recognition

A Comprehensive Noise Robust Speech Parameterization Algorithm Using Wavelet Packet Decomposition-Based Denoising and Speech Feature Representation Techniques

Piecewise-linear transformation-based HMM adaptation for noisy speech

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Input Noisy Speech Research Articles

Related Topics

Articles published on Input Noisy Speech

End‐to‐end speech‐denoising deep neural network based on residual‐attention gated linear units

Deep Neural Network for Supervised Single-Channel Speech Enhancement

CAQoE: A Novel No-Reference Context-aware Speech Quality Prediction Metric

End-to-End Neural Based Modification of Noisy Speech for Speech-in-Noise Intelligibility Improvement

Noise-Robust Voice Conversion Using High-Quefrency Boosting via Sub-Band Cepstrum Conversion and Fusion

Bark scaled oversampled WPT based speech recognition enhancement in noisy environments

Variance based time-frequency mask estimation for unsupervised speech enhancement

Performance improvement of monaural speech separation system using image analysis techniques

An Auto-Focusing-Noise Suppressor for Cell-Phone Videos Based on Multiple Noise References and Spectral Gain Selection

Robust Recognition of English Speech in Noisy Environments Using Frequency Warped Signal Processing

A wavelet- based transform method for quality improvement in noisy speech patterns of Arabic language

최대우도를 부가한 주파수 변이 PMC 방법의 잡음 음성 인식 성능개선

Speech enhancement based on soft audible noise masking and noise power estimation

Statistical voice activity detection based on sparse representation over learned dictionary

Dealing with noise in automatic speech recognition.

A Subtractive-Type Speech Enhancement Using the Perceptual Frequency-Weighting Function

A noise reduction technique of speech signal using ICA and spectral analysis

Dynamic Bayesian Network Inversion for Robust Speech Recognition

A Comprehensive Noise Robust Speech Parameterization Algorithm Using Wavelet Packet Decomposition-Based Denoising and Speech Feature Representation Techniques

Piecewise-linear transformation-based HMM adaptation for noisy speech