Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask

Salinna Abdullah,Majid Zamani,Andreas Demosthenous

doi:10.1109/access.2021.3056711

Salinna Abdullah, Majid Zamani + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3056711

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 61	License type: CC BY 4.0

Affiliation: University College London

Abstract

Many studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs the ideal binary mask or the ideal ratio mask to reconstruct the enhanced speech signal. However, many SE applications in real scenarios demand a desirable balance between denoising capability and computational cost. In this study, first, an improvement over the ideal ratio mask to attain more superior SE performance is proposed through introducing an efficient adaptive correlation-based factor for adjusting the ratio mask. The proposed method exploits the correlation coefficients among the noisy speech, noise and clean speech to effectively re-distribute the power ratio of the speech and noise during the ratio mask construction phase. Second, to make the supervised SE system more computationally-efficient, quantization techniques are considered to reduce the number of bits needed to represent floating numbers, leading to a more compact SE model. The proposed quantized correlation mask is utilized in conjunction with a 4-layer deep neural network (DNN-QCM) comprising dropout regulation, pre-training and noise-aware training to derive a robust and high-order mapping in enhancement, and to improve generalization capability in unseen conditions. Results show that the quantized correlation mask outperforms the conventional ratio mask representation and the other SE algorithms used for comparison. When compared to a DNN with ideal ratio mask as its learning targets, the DNN-QCM provided an improvement of approximately 6.5% in the short-time objective intelligibility score and 11.0% in the perceptual evaluation of speech quality score. The introduction of the quantization method can reduce the neural network weights to a 5-bit representation from a 32-bit, while effectively suppressing stationary and non-stationary noise. Timing analyses also show that with the techniques incorporated in the proposed DNN-QCM system to increase its compactness, the training and inference time can be reduced by 15.7% and 10.5%, respectively.

Highlights

Speech enhancement (SE) is the task of separating speech from nonspeech noise, used with the aim to improve perceived quality and intelligibility of speech
The computational cost of the proposed sum tables method is significantly lower than direct calculation using the ICC definition and this is achieved without sacrificing the SE performance as the energy proportions between speech and noise are used to more precisely adjust the ratio mask; 2) fixed and k-means quantization techniques are used in a two-stage process to reduce the required number of bits to represent the neural network weights, learning target and acoustic features in order to make the supervised deep learning-based SE more compact; and 3) a deep neural networks (DNNs) network has been optimized through experimental findings to be used in combination with the proposed quantized correlation mask (QCM)
Quantization techniques are further applied to the QCM, neural network weights and acoustic features extracted to make the DNN more compact

Summary

INTRODUCTION

Speech enhancement (SE) is the task of separating speech from nonspeech noise, used with the aim to improve perceived quality and intelligibility of speech. The computational cost (or time) of the proposed sum tables method is significantly lower than direct calculation using the ICC definition and this is achieved without sacrificing the SE performance as the energy proportions between speech and noise are used to more precisely adjust the ratio mask; 2) fixed and k-means quantization techniques are used in a two-stage process to reduce the required number of bits to represent the neural network weights, learning target and acoustic features in order to make the supervised deep learning-based SE more compact; and 3) a DNN network has been optimized through experimental findings to be used in combination with the proposed quantized correlation mask (QCM).

THE PROPOSED SYSTEM

CORRELATION-BASED TRAINING TARGET

DATASETS

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Increasing Compactness of Deep Learning Based Speech Enhancement Models With Parameter Pruning and Quantization Techniques
Jyun-Yi Wu ... Chih-Ting Liu
IEEE Signal Processing Letters | VOL. 26
Jyun-Yi Wu, et. al.Jyun-Yi Wu ... Chih-Ting Liu
01 Dec 2019
IEEE Signal Processing Letters | VOL. 26

Speech enhancement of non-stationary noise based on controlled forward moving average
Dariush Farrokhi ... Roberto Togneri
-
Dariush Farrokhi, et. al.Dariush Farrokhi ... Roberto Togneri
01 Oct 2007
01 Oct 2007

Kalman Filtering with Machine Learning Methods for Speech Enhancement

-

04 May 2021
04 May 2021

A new regularized forward blind source separation algorithm for automatic speech quality enhancement
Meriem Zoulikha ... Mohamed Djendi
Applied Acoustics | VOL. 112
Meriem Zoulikha, et. al.Meriem Zoulikha ... Mohamed Djendi
03 Jun 2016
Applied Acoustics | VOL. 112

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access