Mining of association patterns for language modeling

Jen-Tzung Chien,Hung-Ying Chen

doi:10.21437/interspeech.2004-491

Jen-Tzung Chien, Hung-Ying Chen

https://doi.org/10.21437/interspeech.2004-491

Copy DOI

Export

Save

Cite

Publication Date: Oct 4, 2004

Citations: 1

Affiliation: National Cheng Kung University

Abstract
Full-Text
Similar Papers

Abstract

Listen

Language modeling using n-gram is popular for speech recognition and many other applications. The conventional ngram suffers from the insufficiencies of training data, domain knowledge and long distance language dependencies. This paper presents a new approach to mining long distance word associations and incorporating their mutual information into language models. We aim to discover the associations of multiple distant words from training corpus. An efficient algorithm is exploited to merge the frequent word subsets and construct the association patterns. The resulting association pattern n-gram is general with a special realization to trigger pair n-gram where only associations of two distant words are considered. To improve the modeling, we further compensate the weaknesses of sparse training data via parameter smoothing and domain mismatch via online adaptive learning. The proposed association pattern n-gram and several hybrid models are successfully applied for speech recognition. We also find that the incorporation of mutual information of association patterns can significantly reduce the perplexities of language models.

Full Text