Abstract

This paper presents a language-model training method for improving automatic transcription of online spoken contents. Unlike previously studied LVCSR tasks such as broadcast news and lectures, large-sized task-specific corpora for training language models cannot be prepared and used in recognition because of the diversity of topics, vocabularies, and speaking styles. To overcome difficulties in preparing such task-specific language models in advance, we propose collaborative training of language models on the basis of wisdom of crowds. On our public web service for LVCSR-based spoken document retrieval PodCastle, over half a million recognition errors were corrected by anonymous users. By leveraging such corrected transcriptions, component language models for various topics can be built and dynamically mixed to generate an appropriate language model for each podcast episode in an unsupervised manner. Experimental results with Japanese podcasts showed that the mixed languages models significantly reduced the word error rate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call