Abstract

As a local dialect in China, the Yulin dialect is relatively thinly studied in speech recognition, and the available corpus data is relatively small. In this paper, we collect local dialect speech and corpus, build a speech database, analyze its pronunciation characteristics, and build a dictionary corresponding to the dialect with dialect vowels and rhymes as the base element. Thus, the problem of low recognition performance of speech recognition system under dialects, accents, and low resource corpus is solved. Firstly, this paper uses velocity perturbation as a data enhancement scheme to increase the information contained in the input features during feature extraction. Secondly, CNN-TDNNF, capable of long time series training, is used in the model as a neural network combined with the N-gram language model. Finally, the experimental results show that the performance of this scheme is improved by 15.42% compared with the traditional dialect speech recognition system in a dialect environment with a low resource corpus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call