Abstract

In this paper, different methods for speech recognition and speech enhancement are reviewed. Usually, speech enhancement act as a frontend system to enhance the automatic speech recognition (ASR) system performance. Signals captured by the microphones are distorted by reverberations and background noise. A degradation of the signal would make it difficult for the speech recognition system to recognize the speech. By identifying the magnitude spectrograms of the degraded speech, recurrent neural networks (RNN) and deep neural networks (DNN) are trained to perform spectral masking and also perform some algorithms such as Transformer-based neural network (TSTNN), minimum overlap-gap algorithm, residual long short-term memory neural network (ResLSTM), and deep complex convolution recurrent network (DCCRN). It is also possible to amplify and recognize speech by using certain processing technique including spectral subtraction, Wiener and Kalman filtering, MMSE estimation, phase spectrum compensation, multichannel end-to-end system (ME2E), binaural codebook-based speech enhancement, progressive learning-based adaptive noise and speech estimation (PL-ANSE) method, voice activity detection (VAD), adaptive noise reduction algorithms, and beamforming. Hence, the noise embedded in the speech needs to be eliminated for making the speech recognition system more effective in understanding the speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call