Development of Automatic Speech Recognition for Xitsonga Using Subspace Gaussian Mixture Model

Vukosi Rikhotso,Madimetja Jonas Manamela,Thipe Modipa,Tumisho Bilson Mokgonyane

doi:10.1109/icabcd51485.2021.9519355

Abstract

Speech is the most common and efficient form of communication between people. Automatic speech recognition (ASR) or speech-to-text (SST) system is the translation of spoken language into text. In this paper, an automatic speech recognition system for the under-resourced South African language, Xitsonga, is developed to help its mother tongue users who cannot speak the English language. The development of speech technology applications focuses on improving the accuracies of the systems. The challenge is to improve the word error rate systems for under resourced languages. We use the Subspace Gaussian Mixture Model (SGMM) technique with the Kaldi toolkit to develop the automatic speech recognition for Xitsonga using National Centre for Human Language Technology (NCHLT) corpus. The size of the used corpus is 56.26 hours. The word error rate (WER) achieved is 38.61% which is promising given the size of the data.

Full Text