Performance analysis of neural network, NMF and statistical approaches for speech enhancement

Ravi Kumar Kandagatla,Venkata Subbaiah Potluri

doi:10.1007/s10772-020-09751-6

Ravi Kumar Kandagatla, Venkata Subbaiah Potluri

https://doi.org/10.1007/s10772-020-09751-6

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Bayesian Estimators are very useful in speech enhancement and noise reduction. But, it is noted that the traditional estimators process only amplitudes and the phase is left unprocessed. Among the Bayesian estimators, Super- Gaussian based estimators provide improved noise reduction. Super-Gaussian Bayesian estimators, which uses processed phase information for estimation of amplitudes provides further improved results. In this work, the Complex speech coefficients given Uncertain Phase (CUP) based Bayesian estimators like CUP-GG (CUP Estimator with speech spectral coefficients assumed as Gamma and noise spectral coefficients as Generalized Gamma), CUP-NG (Speech as Nakagami) are compared under white noise, pink noise, Babble noise and Non-Stationary factory noise conditions. The statistical estimators show less effective results under completely non-stationary assumptions like non-stationary factory noise, babble noise etc. Non-negative Matrix Factorization (NMF) based algorithms show better performance for non stationary noises. The drawback of NMF is, it requires apriori knowledge about speech. This drawback can be overcome by taking the advantages of both statistical approaches and NMF approaches. NR-NMF and WR-NMF speech enhancement methods are developed by providing posteriori regularization based on statistical assumption of speech and noise DFT coefficients distribution. Also a speech enhancement method which uses CUP-GG estimator and NMF with online noise bases update are considered for comparison. The progress in neural network based approaches for speech enhancement further shown that with large dataset and better training, the speech enhancement algorithms results in improved results. In this work, the neural network approach for speech enhancement is implemented and compared the method with traditional estimators and NMF approaches. For generalization of unseen noise types the proposed neural network approach uses dropout. Also for training the network, the features obtained from apriori SNR and aposteriori SNR is used in this method. The objective of this paper is to analyze the performance of speech enhancement methods based on Neural Network, NMF and statistical based. The objective performance measures Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), Signal to Noise Ratio (SNR), Segmental SNR (Seg SNR) are considered for comparison.

Full Text