Abstract

With the recent research developments, deep learning models are powerful alternatives for speech enhancement and recognition in many real-world applications. Although state-of-the-art models achieve phenomenal results in terms of the background noise reduction, but the challenge is to design robust models for improving the quality, intelligibility, and word error rate. We propose a novel residual connection-based Bidirectional Gated Recurrent Unit (BiGRU) augmented Kalman filtering model for speech enhancement and recognition. In the proposed model, clean speech and noise signals are modeled as autoregressive process and the parameters are composed of linear prediction coefficients (LPCs) and driving noise variances. Recurrent neural networks are trained to estimate the line spectrum frequencies (LSFs) whereas an optimization problem is solved to attain noise variances such that to minimize the divergence between the modeled and predicted autoregressive spectrums of the noise contaminated speech. Augmented Kalman filtering with the estimated parameters are applied to the noisy speech for background noise reduction such that to improve the speech quality, intelligibility, and word error rates. Bidirectional GRUs network is implemented which predicts parameters both in the future and past contexts of the input sequence and outperform in terms of modeling the long-term dependencies. A compensated phase spectra is used to recover the enhanced speech signals. The Kaldi toolkit is employed to train the automatic speech recognition (ASR) system in order to measure the word error rates (WERs). By using the LibriSpeech dataset, the proposed model improved the quality, intelligibility, and word error rates by 35.52%, 18.79%, and 19.13%, respectively under various noisy environments.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call