Abstract

This paper describes large-vocabulary continuous-speech recognition (LVCSR) of Japanese newspaper speech read aloud and Japanese broadcast-news speech. It describes the first Japanese LVCSR experiments using morpheme-based statistical language models. The statistical language models were trained using a large text corpus constructed from several years of newspaper texts and our LVCSR system was evaluated using recorded newspaper speech read by 10 male speakers. It is difficult to train statistical n-gram language models for Japanese because Japanese sentences are written without spaces between words. This difficulty was overcome by segmenting sentences into words with a morphological analyzer and then training the n-gram language models using those words. The LVCSR system was constructed with the language models trained using the newspaper articles, and the acoustic models, which were phoneme hidden Markov models (HMMs) trained using 20 h of speech. The results for recognition of read newspaper speech with a 7k vocabulary were comparable to those for other languages. For the automatic transcription of broadcast-news speech with our LVCSR system, the language models had 20k word vocabularies and were trained using broadcast-news manuscripts. These models achieved better performance than the language models trained using newspaper texts. Our experiments indicate that LVCSR for Japanese works in much the same way as LVCSR for European languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call