On efficient training of word classes and their application to recurrent neural network language models

Rami Botros,Kazuki Irie,Martin Sundermeyer,Hermann Ney

doi:10.21437/interspeech.2015-345

Abstract

In this paper, we investigated various word clustering methods, by studying two clustering algorithms: Brown clustering and exchange algorithm, and three objective functions derived from different class-based language models (CBLM): two-sided, predictive and conditional models. In particular, we focused on the implementation of the exchange algorithm with improved speed. In total, we compared six clustering methods in terms of runtime and perplexity (PP) of the CBLM on a French corpus, and show that our accelerated implementation of exchange algorithm is up to 114 times faster than the original and around 6 times faster than the best implementation of Brown clustering we could find, while performing about the same (slightly better) in PP. In addition, we conducted a keyword search experiment on the Babel Lithuanian task (IARPA-babel304b-v1.0b), which showed that CBLM improves the word error rate (WER) but not the keyword search performance. Furthermore, we used these clustering techniques for the output layer of a recurrent neural network (RNN) language model (LM) and we show that in terms of PP of the RNN LM, word classes trained under the predictive model perform slightly better than those trained under other criteria we considered. Index Terms: word clustering, language modeling, neural network based language model, recurrent neural network, long short-term memory

Full Text