Abstract


 
 
 For purposes of automated speech recognition in under-resourced environments, techniques used to share acoustic data between closely related or similar languages become important. Donor languages with abundant resources can potentially be used to increase the recognition accuracy of speech systems developed in the resource poor target language. The assumption is that adding more data will increase the robustness of the statistical estimations captured by the acoustic models. In this study we investigated data sharing between Afrikaans and Flemish – an under-resourced and well-resourced language, respectively. Our approach was focused on the exploration of model adaptation and refinement techniques associated with hidden Markov model based speech recognition systems to improve the benefit of sharing data. Specifically, we focused on the use of currently available techniques, some possible combinations and the exact utilisation of the techniques during the acoustic model development process. Our findings show that simply using normal approaches to adaptation and refinement does not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed improvement was achieved when developing acoustic models on all available data but estimating model refinements and adaptations on the target data only.
 
 
 
 
 Significance: 
 
 
 
 Acoustic modelling for under-resourced languages
 Automatic speech recognition for Afrikaans
 Data sharing between Flemish and Afrikaans to improve acoustic modelling for Afrikaans
 
 
 

Highlights

  • Speech interfaces to different types of technology are becoming increasingly more common

  • Experimental results are presented for constrained MLLR transformation (CMLLR) and maximum a posteriori (MAP) adaptation as well as heteroscedastic linear discriminant analysis (HLDA) plus speaker adaptive training (SAT) combinations

  • It would seem that both CMLLR and MAP provide insufficient mechanisms to effectively combine data from different sources in the context of cross-language data sharing

Read more

Summary

Introduction

Speech interfaces to different types of technology are becoming increasingly more common Users can use their voice to search the Internet, control the volume of their car radio or dictate. This possibility is only available to users if the required technology exists in the language they speak. In a study by Adda-Decker et al.[6] in which no acoustic data were available for the target language (Luxembourgish), English, French and German data sets were used to train a multilingual as well as three monolingual ASR systems. It was not possible to compare the performance of the German models with models trained on the target language

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call