Speech Enhancement for Robust Speech Recognition Using Weighted Low Rank and Sparse Decomposition Models under Low SNR Conditions

Venkata Sridhar K,Kishore Kumar T

doi:10.18280/ts.390226

Venkata Sridhar K, Kishore Kumar T

Open Access

https://doi.org/10.18280/ts.390226

Copy DOI

Journal: Traitement du Signal	Publication Date: Apr 30, 2022
Citations: 2	License type: public-domain

Affiliation: National Institute of Technology Warangal

Abstract

Noise estimation is a crucial stage in speech enhancement (SE), and it commonly necessitates the use of prior models for speech, noise, or both. Prior models, on the other hand, can be ineffective in dealing with unseen nonstationary noise, especially at low signal to noise (SNR) levels. This paper proposes to assess the efficacy of an unsupervised SE approach based on weighted low rank and sparse matrix factorization to estimate noise and speech when neither is available beforehand by decomposing the input noisy spectrum into a low-rank noise component and a sparse speech component. Due to the approximation of the actual rank of noise, these techniques are constrained, and they do not directly exploit the low-rank property in optimization. Nuclear norm minimization (NNM) is the most well-known approach, as it can precisely recover the matrix's rank under certain restricted and theoretical guarantee conditions. NNM, on the other hand, is unable to reliably estimate the matrix rank in many situations. Significant advancements in computer vision and machine learning applications have demonstrated that a weighted nuclear norm minimization (WNNM), overcomes NNM shortcomings, and achieves a superior matrix rank approximation than NNM. Consequently, in this study, we present alternate SE algorithms that make use of weighted low rank and sparsity constraints to separate speech and noise spectrograms. Following that, they were trained and evaluated on a standard Automatic Speech Recognition (ASR) engine to lower the Word Error Rate (WER). Extensive investigations on the impact of real-world noise on speech signals show that the proposed model outperforms the existing state of art models in terms of objective measures like SDR, PESQ, SIG, BAK, OVL, and STOI values in varied noise circumstances under low SNR environments.

Full Text