Abstract
We present a method enabling the unsupervised discovery of sub-word units (SWUs) and associated pronunciation lexicons for use in automatic speech recognition (ASR) systems. This includes a novel SWU discovery approach based on self-organising HMM-GMM states that are agglomeratively tied across words as well as a novel pronunciation lexicon induction approach that iteratively reduces pronunciation variation by means of model pruning. Our approach relies only on recorded speech and associated orthographic transcriptions and does not require alphabetic graphemes. We apply our methods to corpora of recorded radio broadcasts in Ugandan English, Luganda and Acholi, of which the latter two are under-resourced. The speech is conversational and contains high levels of background noise, and therefore presents a challenge to automatic lexicon induction. We demonstrate that our proposed method is able to discover lexicons that perform as well as baseline expert systems for Acholi, and close to this level for the other two languages when used to train DNN-HMM ASR systems. This demonstrates the potential of the method to enable and accelerate ASR for under-resourced languages for which a phone inventory and pronunciation lexicon are not available by eliminating the dependence on human expertise this usually requires.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.