Abstract

As a local dialect in China, the Yulin dialect is relatively thinly studied in speech recognition, and the available corpus data is relatively small. In this paper, we collect local dialect speech and corpus, build a speech database, analyze its pronunciation characteristics, and build a dictionary corresponding to the dialect with dialect vowels and rhymes as the base element. Thus, the problem of low recognition performance of speech recognition system under dialects, accents, and low resource corpus is solved. Firstly, this paper uses velocity perturbation as a data enhancement scheme to increase the information contained in the input features during feature extraction. Secondly, CNN-TDNNF, capable of long time series training, is used in the model as a neural network combined with the N-gram language model. Finally, the experimental results show that the performance of this scheme is improved by 15.42% compared with the traditional dialect speech recognition system in a dialect environment with a low resource corpus.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.