Abstract

In this paper, pronunciation lexicon, multi-lingual bottleneck features, semi-supervised learning, and data selection are investigated to help to improve the performance of automatic speech recognition (ASR) and keyword search (KWS) under very low-resource condition. For very low-resource condition, it is just about 3 hours of transcribed speech data, and there is no manual pronunciation for words in the transcription. According to our experiments on OpenKWS15 surprise language Swahili, some significant results can conclude. (1) Pronunciation lexicon has great influence on the performance of keyword search system at very limited language package (VLLP) condition when comparing with full language package (FLP) condition. (2) Multi-lingual bottleneck features (BNF) can improve the performance of ASR and KWS, and when combining with semi-supervised learning, the performance further improve. (3) Using large scale text corpus to train language model (LM), it can greatly improve the performance of KWS system and corresponding underlying ASR. When extending vocabulary size for keyword search, it can reduce out-of-vocabulary in keyword list, and thus slightly improve the performance of KWS system. (4) Initial transcription data selection is important to improve the performance of KWS and underlying ASR system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call