Abstract
Out-of-vocabulary (OOV) keywords present a challenge for keyword search (KWS) systems especially in the low-resource setting. Previous research has centered around approaches that use a variety of subword units to recover OOV words. This paper systematically investigates morphology-based subword modeling approaches on seven low-resource languages. We show that using morphological subword units (morphs) in speech recognition decoding is substantially better than expanding word-decoded lattices into subword units including phones, syllables and morphs. As alternatives to grapheme-based morphs, we apply unsupervised morphology learning to sequences of phonemes, graphones, and syllables. Using one of these phone-based morphs is almost always better than using the grapheme-based morphs, but the particular choice varies with the language. By combining the different methods, a substantial gain is obtained over the best single case for all languages, especially for OOV performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.