Swahili Speech Dataset Development and Improved Pre-training Method for Spoken Digit Recognition

Alexander R Kivaisi,Qingjie Zhao,Jimmy T Mbelwa

doi:10.1145/3597494

Abstract

Speech dataset is an essential component in building commercial speech applications. However, low-resource languages such as Swahili lack such a resource that is vital for spoken digit recognition. For languages where such resources exist, they are usually insufficient. Thus, pre-training methods have been used with external resources to improve continuous speech recognition. However, to the best of our knowledge, no study has investigated the effect of pre-training methods specifically for spoken digit recognition. This study aimed at addressing these problems. First, we developed a Swahili spoken digit dataset for Swahili spoken digit recognition. Then, we investigated the effect of cross-lingual and multi-lingual pre-training methods on spoken digit recognition. Finally, we proposed an effective language-independent pre-training method for spoken digit recognition. The proposed method has the advantage of incorporating target language data during the pre-training stage that leads to an optimal solution when using less training data. Experiments on Swahili (being developed), English, and Gujarati datasets show that our method achieves better performance compared with all the baselines listed in this study.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Swahili Speech Dataset Development and Improved Pre-training Method for Spoken Digit Recognition

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Jul 20, 2023
Citations: 2

Similar Papers

Spoken digits recognition by subspace decomposition method
K Kusakari ... T Murakami
-
K Kusakari, et. al.K Kusakari ... T Murakami
01 Jan 2004
01 Jan 2004

Spoken digits recognition using DP matching combined with a subspace decomposition method
Ken Kusakari ... Yoshihisa Ishida
The Journal of the Acoustical Society of America | VOL. 114
Ken Kusakari, et. al.Ken Kusakari ... Yoshihisa Ishida
01 Oct 2003
The Journal of the Acoustical Society of America | VOL. 114

Low-Resource Language Processing Using Improved Deep Learning with Hunter–Prey Optimization Algorithm
Fahd N Al-Wesabi ... Hala J Alshahrani
Mathematics | VOL. 11
Fahd N Al-Wesabi, et. al.Fahd N Al-Wesabi ... Hala J Alshahrani
30 Oct 2023
Mathematics | VOL. 11

A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks
Seyyede Zohreh Seyyedsalehi ... Seyyed Ali Seyyedsalehi
Neurocomputing | VOL. 168
Seyyede Zohreh Seyyedsalehi, et. al.Seyyede Zohreh Seyyedsalehi ... Seyyed Ali Seyyedsalehi
22 May 2015
Neurocomputing | VOL. 168

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Swahili Speech Dataset Development and Improved Pre-training Method for Spoken Digit Recognition

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing