Abstract

At present, most of the speech recognition research is based on a wide range of regions, and there are few studies on speech recognition of urban dialects. The mainstream speech recognition methods are mostly based on ResNet network, using ResNet network as acoustic model and N-gram as language model. In this study, DenseNet is used as the basic network, and the data set of Zigong dialect subdivided by Sichuan dialect is taken as the research object of speech recognition. DenseNet-BiGRU + CTC is constructed as the acoustic model of speech recognition, and RNN is used as the speech recognition model of language model. Experiments show that the speech recognition model using DenseNet network as the basic network has higher accuracy than the model based on ResNet. Compared with the GRU-CTC network word error rate (WER) decreased by 3 %, compared with the DPCNN-Attention-CTC speech recognition method error rate decreased by 5 %.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call