Abstract

In this paper, we improve morph-based speech recognition system by focusing adaptation efforts on acronyms (ACRs) and foreign proper names (FPNs). An unsupervised language model (LM) adaptation framework based on two-pass decoding is used. Vocabulary adaptation is applied alongside unsupervised LM adaptation. The aim is to improve both language and pronunciation modeling for FPNs and ACRs. A smart selection algorithm is used to find the most likely topically related foreign words and acronyms from in-domain text. New pronunciation rules are generated for the selected words. Different kinds of morpheme adaptation operations are also evaluated on the ACR and FPN candidate words, to ensure optimal results are gained from pronunciation adaptation. Statistically significant improvements in average word error rate (WER), and term error rate (TER), are achieved using a combination of unsupervised LM adaptation with vocabulary adaptation focused on ACRs and FPNs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call