Using recurrent neural network structure with Enhanced Multi-Head Self-Attention for sentiment analysis

Xue-Liang Leng,Tao Liu,Xiao-Ai Miao

doi:10.1007/s11042-020-10336-3

Abstract

Sentiment analysis is a process of analysis, processing, induction, and reasoning of subjective text with emotional color. It is a research direction of Natural Language Processing (NLP). It is often used to extract the attitudes towards someone or something of people. That can help users find potential problems to improve or predict. As one of the main resources of online media data, film review information is often used as a dataset in the field of sentiment analysis. Researchers put forward many models in sentiment analysis to analyze the film review dataset. Accuracy, precision, recall rate, F1-scores are important standards to measure the quality of a model. To improve these criteria, a new model is proposed in this paper. The new model combines a bidirectional Long Short-Term Memory network (biLSTM) or a bidirectional Gated Recurrent Unit (biGRU) and an Enhanced Multi-Head Self-Attention mechanism. The Enhanced Multi-Head Self-Attention is a two-layer modified Transformer encoder. This modified Transformer encoder is that its masking operation and the last feedforward layer are removed. Besides, the loss function of this new model is the sum of the weighted root mean square error (RMSE) and the cross entropy loss. The operation of this sum can improve the ability of auto-encoder to reproduce. That can improve classification accuracy. The proposed model is an autoencoder classification model. In this model, biLSTM or biGRU are used as encoders and decoders at both ends of the network. Enhanced Multi-Head Self-Attention is used to encode the inter-sentence information as the middle hidden layer. A four-layer autoencoder network model is constructed to perform sentiment analysis on movie review in this paper. The movie review data sets (IMDB movie comment data set and SST-2 sentiment data set) are used in experiments. Experiment results show that the proposed model performs better in terms of accuracy, precision, recall rate, and F1-scores comparing with the baseline models. BiLSTM is better than biGRU by comparing the effect of them in the model. Finally, Bidirectional Encoder Representations from Transformers (BERT) is used in our method instead of word2vec as a pre-training structure. Compared with the baseline model based on BERT, the proposed model is better.

Full Text