Abstract

Accurate recognition of speech in noisy environment is still an obstacle for wider application of speech recognition technology. The robustness of a speech recognition system is heavily influenced by the ability to handle the presence of background noise. In this research work, we propose a model based on Deep Fourier Neural Network (DFNN) for Automatic Speech Recognition (ASR) using LibriSpeech dataset. Most of the existing speech recognition techniques lack the robustness of handling background noise, as a result these techniques are not applicable in real-time. In order to mitigate the challenges of background noise, this research work proposes an efficient recognition technique which analyses in detail the raw audio waveforms using the Deep Fourier Neural Network (DFNN). This novel deep learning approach has a concise architecture and is an efficient approach for automatic speech recognition. The proposed deep learning approach embeds the Fourier transform, which is one of the most popular feature representations transform for audio signal processing. The Fourier transform extracts the core information from waveforms in the form of short term spectra of the speech signal as a function of time. The extracted short term spectra are analyzed deeply in the proposed DFNN model for accurate speech recognition in the presence of noise.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.