Towards Robust Indonesian Speech Recognition with Spontaneous-Speech Adapted Acoustic Models

Devin Hoesen,Cil Hardianto Satriawan,Dessi Puji Lestari,Masayu Leylia Khodra

doi:10.1016/j.procs.2016.04.045

Abstract

This paper presents our work in building an Indonesian speech recognizer to handle both spontaneous and dictated speech. The recognizer is based on the Gaussian Mixture and Hidden Markov Models (GMM-HMM). The model is first trained on 73hours of dictated speech and 43.5 minutes of spontaneous speech. The dictated speech is read from prepared transcripts by a diverse group of 244 Indonesian speakers. The spontaneous speech is manually labelled from recordings of an Indonesian parliamentary meeting, and is interspersed with noises and fillers. The resulting triphone model is then adapted only to the spontaneous speech using the Maximum A-posteriori Probability (MAP) method. We evaluate the adapted model using separate dictated and spontaneous evaluation sets. The dictated set consists of speech from 20 speakers totaling 14.5 hours. The spontaneous set is derived from the recording of a regional government meeting, consisting of 1085 utterances totaling 48.5minutes. Evaluation of a MAP-adapted spontaneous set yields a 2.60% absolute increase in Word Accuracy Rate (WAR) over the un-adapted model, outperforming MMI adaptation. Conversely, MMI adaption of the dictated set outperforms the MAP adaptation by achieving an absolute increase of 1.48% in WAR over the un-adapted model. We also demonstrate that fMLLR speaker adaptation is unsuitable for our task due to limited adaptation data.

Full Text