Abstract
The performance of speech coding, speech recognition, and speech enhancement systems that rely on the augmented Kalman filter (AKF) largely depend upon the accuracy of clean speech and noise linear prediction coefficient (LPC) estimation. The formulation of clean speech and noise LPC estimation as a supervised learning task has shown considerable promise as of late. Generally, a deep neural network (DNN) learns to map noisy speech features to a training target that can be used for clean speech and noise LPC estimation. Such training targets fall into four categories: Line spectrum frequency (LSF), LPC power spectrum (LPC-PS), power spectrum (PS), and magnitude spectrum (MS) training targets. The choice of training target can have a significant impact on LPC estimation accuracy. Motivated by this, we perform a comprehensive study of the training targets with the aim of determining which is best for LPC estimation. To this end, we evaluate each training target using a temporal convolutional network (TCN) and a multi-head attention-based network. A large training set constructed from a wide variety of conditions, including real-world non-stationary and coloured noise sources over a range of signal-to-noise ratio (SNR) levels, is used for training. Testing on the NOIZEUS corpus demonstrates that the LPC-PS as the training target produces the lowest clean speech LPC spectral distortion (SD) level. We also construct the augmented Kalman filter (AKF) with the estimated speech and noise LPC parameters of each training target. Subjective AB listening tests and seven objective quality and intelligibility evaluation measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR) revealed that the LPC-PS training target produced enhanced speech at the highest quality and intelligibility amongst the training targets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: Speech Communication
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.