Abstract

We present a method enabling the unsupervised discovery of sub-word units (SWUs) and associated pronunciation lexicons for use in automatic speech recognition (ASR) systems. This includes a novel SWU discovery approach based on self-organising HMM-GMM states that are agglomeratively tied across words as well as a novel pronunciation lexicon induction approach that iteratively reduces pronunciation variation by means of model pruning. Our approach relies only on recorded speech and associated orthographic transcriptions and does not require alphabetic graphemes. We apply our methods to corpora of recorded radio broadcasts in Ugandan English, Luganda and Acholi, of which the latter two are under-resourced. The speech is conversational and contains high levels of background noise, and therefore presents a challenge to automatic lexicon induction. We demonstrate that our proposed method is able to discover lexicons that perform as well as baseline expert systems for Acholi, and close to this level for the other two languages when used to train DNN-HMM ASR systems. This demonstrates the potential of the method to enable and accelerate ASR for under-resourced languages for which a phone inventory and pronunciation lexicon are not available by eliminating the dependence on human expertise this usually requires.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call