Spectral Analysis for Automatic Speech Recognition and Enhancement

Jane Oruh,Serestina Viriri

doi:10.1007/978-3-030-70866-5_16

Abstract

Accurate recognition of noisy speech signal is still an obstacle for wider application of speech recognition technology. The robustness of a speech recognition system is heavily influenced by the ability to handle the presence of background noise. In this work, a Short Time Fourier Transform (STFT) filtering technique for the enhancement and recognition of the speech signal is presented. Conventionally, STFT filtering has been applied in speech analysis. However, in this study the combination of modified STFT with Adaptive window width based on the Chirp Rate, termed ASTFT, in conjunction with Spectrogram Features is proposed for optimal speech recognition and enhancement. LibriSpeech ASR Corpus is the benchmark dataset for this experiment. The spectrum from the enhanced Speech signal is estimated using several spectrogram features to obtain a unit peak amplitude. Priori Signal-to-Noise Ratio (SNR) estimation is performed on the modified STFT speech signal, and it achieved an SNR of 31.86 dB which is considered to be an effectively clean speech signal.KeywordsNoise reductionSTFT filteringSpectrum estimationAutomatic speech recognitionSpeech enhancementSignal-to-Noise-Ratio

Full Text