Abstract
Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC is compared to six existing deep learning-based methods. Compared to other deep learning approaches to clean speech LPC estimation, DeepLPC produces a lower spectral distortion (SD) level than existing methods, confirming that it exhibits less bias. DeepLPC also produced higher objective scores than any of the competing methods (with an improvement of 0.11 for CSIG, 0.15 for CBAK, 0.14 for COVL, 0.13 for PESQ, 2.66% for STOI, 1.11 dB for SegSNR, and 1.05 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC was also the most preferred by 10 listeners. By producing less biased clean speech and noise LPC estimates, DeepLPC enables the AKF to produce enhanced speech at a higher quality and intelligibility.
Highlights
The main objective of a speech enhancement algorithm (SEA) is to improve the quality and intelligibility of noise corrupted speech [1]
PROPOSED SPEECH ENHANCEMENT ALGORITHM To address the shortcomings of Deep Xi-augmented Kalman filter (KF) (AKF) and Deep Xi-KF highlighted in the previous section, we propose the DeepLPC framework
DeepLPC maps each frame of the noisy speech magnitude spectrum to the linear prediction coefficient (LPC) power spectrum of the clean speech and noise signal
Summary
The main objective of a speech enhancement algorithm (SEA) is to improve the quality and intelligibility of noise corrupted speech (or noisy speech) [1]. In [33], Yu et al adopted a DNN and an LSTM network to estimate the clean speech and noise LPCs, respectively, as well as multi-band spectral subtraction (MB-SS) post-processing [4] for coloured-noise AKF-based speech enhancement (LSTM-CKFS). In light of the shortcomings of existing deep learningbased KF and AKF methods (presented in Table 1) this paper introduces DeepLPC, a deep learning framework for accurately estimating the clean speech and noise LPC parameters. The proposed method aims to mitigate the weaknesses of previously proposed deep learning-based KFs and AKFs, by providing an improved estimate of the clean speech LPCs. The structure of this paper is as follows: background knowledge is presented, including the signal.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.