Chinese Speech Recognition System Based on Neural Network Acoustic Network Model

Yuhan Song

doi:10.1016/j.procs.2023.11.018

Abstract

In order to understand the Chinese speech recognition system based on network models, the author proposes a research on a Chinese speech recognition system based on neural network acoustic network models. The author first analyzed a speech recognition system based on neural networks. The system first performs sampling and filtering, pre weighting, signal framing, and endpoint detection operations on the speech signal, extracts the LPC, LPCC, and MFCC values of the preprocessed data, improves the MFCC values, and trains and constructs a neural network model, secondly, in view of the problem that the accuracy of System identification recognition declines when there is a large difference between the target Chinese speech tested and the speaker speech of training data in the speech recognition system, a speaker language adaptation SA (Speaker Adaptation) method based on deep neural network (DNN) is proposed. It is speaker adaptation in the feature space. By adding speaker identity vector information to the DNN Acoustic model, speaker difference information in the feature is removed to reduce the impact of speaker difference and retain semantic information. Finally, the experimental results on TEDLIUM open source dataset show that when the features of this method are fbank and fMLLR respectively, the system word error rate WER (Word Error Rate) is improved by 7.8% and 6.8% compared with the baseline neural network DNN Acoustic model. The reasonable selection of speech features can not only improve the training efficiency of the recognition model, but also effectively improve the recognition accuracy of the model, thereby improving recognition performance.

Full Text