Speech recognition technology is an important branch in the field of artificial intelligence, aiming to transform human speech into computer-readable text information. However, speech recognition technology still faces many challenges, such as noise interference, and accent and speech rate differences. An aim of this paper is to explore a deep learning-based speech recognition method to improve the accuracy and robustness of speech recognition. Firstly, this paper introduces the basic principles of speech recognition and existing mainstream technologies, and then focuses on the deep learning-based speech recognition method. Through comparative experiments, it is found that the self-attention mechanism performs best in speech recognition tasks. In order to further improve speech recognition performance, this paper proposes a deep learning model based on the self-attention mechanism with DCNN-GRU. The model realizes the dynamic attention to an input speech by introducing the self-attention mechanism in a neural network model instead of an RNN and with a deep convolutional neural network, which improves the robustness and recognition accuracy of this model. This experiment uses 170 h of Chinese dataset AISHELL-1. Compared with the deep convolutional neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of at least 6% in CER. Compared with a bidirectional gated recurrent neural network, the deep learning model based on the self-attention mechanism with DCNN-GRU accomplishes a reduction of 0.7% in CER. And finally, this experiment is performed on a test set analyzed the influencing factors affecting the CER. The experimental results show that this model exhibits good performance in various noise environments and accent conditions.
Read full abstract