Abstract

Speech recognition systems have low accuracy in recognizing the Uyghur language, a low-resource language, due to its strong language specificity and few public training datasets. Given this problem, considering the characteristics of Uyghur, we use morpheme units to build a language model and use mixture data augmentation methods to expand the training data. A 9-layer TDNN-F is applied, which can effectively utilize contextual information. An optimal 9.88% WER (Word Error Rate) is achieved in experiments on the open-source dataset THUYVG-20. Compared to the baseline system of this dataset, the WER is reduced by 6.7%, which significantly improves the accuracy of the Uyghur speech recognition, and provides a reference in other low-resource languages for speech recognization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.