Abstract

Sentiment classification is a fundamental task in many natural language processing applications. Neural networks have achieved great successes on the sentiment classification task in recent years, since recurrent neural networks and long-short-term memory networks have the ability to deal with sequences of different lengths and to capture contextual semantic information. However, the effectiveness of these methods is limited when used to extract contextual information from relatively long texts. Therefore, in our model, we apply bidirectional gated recurrent units to capture contextual information as far as possible when learning word representations, which may effectively reduce the noise compared to other methods. We also propose a novel loss function namely drop loss (DL) which makes the model focus on the hard examples - examples which are easily classified incorrectly - in order to improve the accuracy of the model. We experiment on four commonly used datasets, and the results show that the proposed method has a good performance on four datasets, and needs fewer parameters compared with recent benchmarks, such as CoVe, ULMFiT, embeddings from language models, and bidirectional encoder representations from transformers. Furthermore, we demonstrate that the classification performance of existing shallow network models can be significantly improved by using DL. In particular, the accuracy of the CNN+LSTM model improves 9% on the IMDB-10 dataset.

Highlights

  • S ENTIMENT classification is one of the most popularly used natural language processing (NLP) techniques and has been applied to many areas, such as E-commerce websites, stock forecast [1], and political orientation analyses [2]–[4]

  • We evaluate our model on the following datasets: the Stanford massive open online courses (MOOCs) posts datasets, IMDB, and the Sentiment Treebank

  • The results show that our approach may suffer from the data sparsity problem less and capture more contextual information of features compared with traditional methods using the BoW features

Read more

Summary

Introduction

S ENTIMENT classification is one of the most popularly used natural language processing (NLP) techniques and has been applied to many areas, such as E-commerce websites, stock forecast [1], and political orientation analyses [2]–[4]. In the sentiment classification task, feature-based representations play an important role, often based on the bag-of-words (BoWs) model [5], where bi-grams or larger n-grams are designed to represent features. A BoW model is used to represent documents by Pang et al [6] and Wang and Manning [7], who both build SVM classifiers for text classification. SVM is an extremely strong performer, the problem of data sparsity when using the BoW features heavily affects the classification accuracy [8]. Word embedding [9] has brought a new inspiration for solving the data sparsity problem to many NLP tasks [10], because it can represent each word as a low-dimensional, continuous, and real-valued vector [11]. Rao et al [21], Tang et al [11], and Xu et al [22] utilized word embeddings to present words before they use neural networks to learn word representations

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.