Abstract

Recent years, automatic speech recognition (ASR) tasks often use language models as an adjunct to ASR models. The density ratio approach (DRA) is one of several language model integration methods. It is known that Japanese has a much larger number of characters than the alphabet language, and that there are variations in reading with homonyms and the same characters. It was unclear whether the “implicit language information” of character-based encoder–decoder ASR model using beam search algorithm is approximated by the external language model. In our experiments, We have applied an DRA to a Japanese encoder–decoder ASR model to reduce character error rate (CER) in cross-domain scenarios. Cross-domain CERs were calculated for the Japanese academic presentation speech (APS) corpus and the Japanese simulated presentation speech (SPS) corpus. This method achieved a relative error reduction of 11.0% and 22.5% with the RNN and Transformer models compared to the shallow fusion. To investigate the applicability of different speaking styles to different domains, We also conducted an experiment to replace the “implicit language information” inside the CSJ ASR model with Mainichi Shimbun language model. For the JNAS task, the DRA achieved a relative error reduction of 7.3% compared to the shallow fusion method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.