Speech Enhancement System for Automatic Speech Recognition in Automotive Environment

Gokul G Nair,C Santhosh Kumar

doi:10.1109/icccnt51525.2021.9579986

Abstract

The Automatic Speech Recognition (ASR) system is becoming an unavoidable feature in the automotive, nowadays. The main goal of ASR is to get the machine to understand the spoken language. The automotive speech recognition system fulfills the hands on the wheel, eyes on the road principle of automotive designs by availing various physical controls of the vehicle via speech commands for the driver. Automotive speech recognition differs from other speech recognition systems in the environment in which it is operating. The speech recognition system performance is degraded by noise sources within the automotive cabin. For low Signal to Noise Ratio (SNR) signals, the speech recognition system alone cannot interpret the speech signal and results in the failure of the automotive speech recognition system. This paper aims to improve the intelligibility of speech and its quality in the automotive environment, by processing the speech signal before feeding it to the speech recognition system. We have performed experiments on classical speech enhancement techniques and the Deep Neural Network (DNN) based models. Thus a Wavenet - Long Short Term Memory (LSTM) Network is created that can process the signals to enhances the speech quality and suppresses the noise content, so that the speech recognition system can work accurately, even in the low SNR signal.

Full Text