Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus

G Kurata,S Mori,M Nishimura

doi:10.1109/icassp.2006.1660201

Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus

G Kurata, S Mori + Show 1 more

https://doi.org/10.1109/icassp.2006.1660201

Copy DOI

Publication Date: May 14, 2006

Citations: 10

Affiliation: IBM Research - Tokyo

#Large Vocabulary Continuous Speech Recognition #Large Vocabulary Continuous Speech Recognition System + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

The target uses of Large Vocabulary Continuous Speech Recognition (LVCSR) systems are spreading. It takes a lot of time to build a good LVCSR system specialized for the target domain because experts need to manually segment the corpus of the target domain, which is a labor-intensive task. In this paper, we propose a new method to adapt an LVCSR system to a new domain. In our method, we stochastically segment a Japanese raw corpus of the target domain. Then a domain-specific Language Model (LM) is built based on this corpus. All of the domain-specific words can be added to the lexicon for LVCSR. Most importantly, the proposed method is fully automatic. Therefore, we can reduce the time for introducing an LVCSR system drastically. In addition, the proposed method yielded a comparable or even superior performance to use of expensive manual segmentation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.