Stacked Language Models for an Optimized Next Word Generation

Ednah Olubunmi Aliyu,Eduan Kotze

doi:10.23919/ist-africa56635.2022.9845545

Abstract

Next word prediction task is the application of a language model in natural language generation that deals with generating words by repeatedly sampling the next word conditioned on the previous choices. This paper proposes a stacked language model for optimized next word generation using three models. In stage I, the meaning of a word is captured through learn embedding and the structure of the text sequence is encoded using a stacked Long Short Term Memory (LSTM). In stage II, a Bidirectional Long Short Term Memory (Bi-LSTM) stacking on top of the unidirectional LSTM encodes the structure of the text sequences, while in stage III, a two-layer Gated Recurrent Unit (GRU) is used to capture text sequences of data. The proposed system was implemented using Python 3.7, Tensorflow 2.6.0 with Keras and a Nvidia Graphical Processing Unit (GPU). The proposed deep learning models were trained using the Pride and Prejudice corpus from the Project Gutenberg library of ebooks. The evaluation was performed by predicting the next 3 words after considering 10 sets of text sequences. From the experiment carried out, the accuracy of the two-layer LSTM model measured 83%, the accuracy of the Bi-LSTM stacking on unidirectional LSTM model measured 79%, and the accuracy of the two-layer GRU model measured 81%. Regarding predictions, the two-layer LSTM predicted the 10 sequences correctly, the Bi-LSTM stacking on unidirectional LSTM predicted 8 sequences correctly and the two-layer GRU predicted 7 sequences correctly.

Full Text