Accurate recognition of speech in noisy environment is still an obstacle for wider application of speech recognition technology. The robustness of a speech recognition system is heavily influenced by the ability to handle the presence of background noise. In this research work, we propose a model based on Deep Fourier Neural Network (DFNN) for Automatic Speech Recognition (ASR) using LibriSpeech dataset. Most of the existing speech recognition techniques lack the robustness of handling background noise, as a result these techniques are not applicable in real-time. In order to mitigate the challenges of background noise, this research work proposes an efficient recognition technique which analyses in detail the raw audio waveforms using the Deep Fourier Neural Network (DFNN). This novel deep learning approach has a concise architecture and is an efficient approach for automatic speech recognition. The proposed deep learning approach embeds the Fourier transform, which is one of the most popular feature representations transform for audio signal processing. The Fourier transform extracts the core information from waveforms in the form of short term spectra of the speech signal as a function of time. The extracted short term spectra are analyzed deeply in the proposed DFNN model for accurate speech recognition in the presence of noise.
Read full abstract