Research on dialect speech recognition based on DenseNet-CTC

doi:10.25236/ajcis.2023.060204

Open Access

PDF Available

https://doi.org/10.25236/ajcis.2023.060204

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

At present, most of the speech recognition research is based on a wide range of regions, and there are few studies on speech recognition of urban dialects. The mainstream speech recognition methods are mostly based on ResNet network, using ResNet network as acoustic model and N-gram as language model. In this study, DenseNet is used as the basic network, and the data set of Zigong dialect subdivided by Sichuan dialect is taken as the research object of speech recognition. DenseNet-BiGRU + CTC is constructed as the acoustic model of speech recognition, and RNN is used as the speech recognition model of language model. Experiments show that the speech recognition model using DenseNet network as the basic network has higher accuracy than the model based on ResNet. Compared with the GRU-CTC network word error rate (WER) decreased by 3 %, compared with the DPCNN-Attention-CTC speech recognition method error rate decreased by 5 %.

Full Text