Abstract

Language Modeling (LM) is a subtask in Natural Language Processing (NLP), and the goal of LM is to build a statistical language model that can learn and estimate a probability distribution of natural language over sentences of terms. Recently, many recurrent neural network based LM, a type of deep neural network for dealing with sequential data, have been proposed and achieved remarkable results. However, they only rely upon the analysis on the words occurred in the sentences even though every sentence contains various useful morphological information, such as Part-of-Speech (POS) tag that is necessary for constituting a sentence and can be used for an analysis as a feature. Although morphological information can be useful for LM, using that information as the input data to neural network based LM is not straightforward because adding features between words as a one-dimensional array can cause the vanishing gradient problem by increasing the time steps of recurrent neural network. In order to solve this problem, in this paper, we propose a CNN-LSTM based language model that deals with textual data regarding a multi-dimensional data with respect to the input of the network. To train this multi-dimensional input to Long-Short Term Memory (LSTM), we use a convolutional neural network (CNN) with a 1×1 filter for dimensionality reduction of input data to avoid the vanishing gradient problem by decreasing the time step between input words. In addition, our approach that uses multi-dimension data reduced by CNN can be used as a plugin with many customized LSTM based LM. On the Penn Treebank corpus, our model has shown improvement of the perplexity with not only vanilla LSTM but customized LSTM models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call