Abstract

Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency and some of its phonological, morphological, and orthographic features challenge the development and initiatives in the area of ASR. By considering these challenges, this study is the first endeavor, which analyzed the characteristics of the language, prepared speech corpus, and developed different ASR systems. A small 3-hour read speech corpus was prepared and transcribed. Different basic and rounded phone unit-based speech recognizers were explored using multilingual deep neural network (DNN) modeling methods. The experimental results demonstrated that all the basic phone and rounded phone unit-based multilingual models outperformed the corresponding unilingual models with the relative performance improvements of 5.47% to 19.87% and 5.74% to 16.77%, respectively. The rounded phone unit-based multilingual models outperformed the equivalent basic phone unit-based models with relative performance improvements of 0.95% to 4.98%. Overall, we discovered that multilingual DNN modeling methods are profoundly effective to develop Chaha speech recognizers. Both the basic and rounded phone acoustic units are convenient to build Chaha ASR system. However, the rounded phone unit-based models are superior in performance and faster in recognition speed over the corresponding basic phone unit-based models. Hence, the rounded phone units are the most suitable acoustic units to develop Chaha ASR systems.

Highlights

  • Human language technologies (HLTs) are important for the low-resource languages, to revitalize and document them for preventing the challenge of extinction, and to raise the interest and make the language attractive again for their native speakers [1]

  • The results demonstrate that the rounded phone unit-based model outperforms the basic phone unit-based model with a relative word error rate (WER) reduction of 1.88%

  • The time delay neural network (TDNN)-CH models are trained using the combined Chaha in domain real and synthetic speech corpus, and the results demonstrate that the rounded phone unit-based TDNN-CH model outperforms the equivalent basic phone unit-based model with an absolute WER reduction of 1.16%, as presented in the first row of Table 6

Read more

Summary

Introduction

Human language technologies (HLTs) are important for the low-resource languages, to revitalize and document them for preventing the challenge of extinction, and to raise the interest and make the language attractive again for their native speakers [1]. The model regularization techniques can reduce the overfitting problem to some extent, but to overcome the above problems substantially and to develop reliable ASR systems for the low-resource languages, it is better to increase the size of the training datasets. The size of the training datasets can be increased by preparing a new training corpus, borrowing from high-resource languages, and generating synthetic datasets via various audio data augmentation techniques. It is better to use the second and the third methods, namely, borrow training datasets from the high-resource languages and use the synthetic dataset by generating via data augmentation techniques. Using these methods, different multilingual acoustic modeling paradigms were investigated in the previous works. Phone sharing [2], multitask learning [3] [4] [5], and weight transfer [4] [5] were utilized to develop reliable ASR systems for low-resource languages

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.