Abstract

Speech dataset is an essential component in building commercial speech applications. However, low-resource languages such as Swahili lack such a resource that is vital for spoken digit recognition. For languages where such resources exist, they are usually insufficient. Thus, pre-training methods have been used with external resources to improve continuous speech recognition. However, to the best of our knowledge, no study has investigated the effect of pre-training methods specifically for spoken digit recognition. This study aimed at addressing these problems. First, we developed a Swahili spoken digit dataset for Swahili spoken digit recognition. Then, we investigated the effect of cross-lingual and multi-lingual pre-training methods on spoken digit recognition. Finally, we proposed an effective language-independent pre-training method for spoken digit recognition. The proposed method has the advantage of incorporating target language data during the pre-training stage that leads to an optimal solution when using less training data. Experiments on Swahili (being developed), English, and Gujarati datasets show that our method achieves better performance compared with all the baselines listed in this study.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.