Abstract

We describe the sentiment analysis experiments that were performed on the Lithuanian Internet comment dataset using traditional machine learning (Naïve Bayes Multinomial—NBM and Support Vector Machine—SVM) and deep learning (Long Short-Term Memory—LSTM and Convolutional Neural Network—CNN) approaches. The traditional machine learning techniques were used with the features based on the lexical, morphological, and character information. The deep learning approaches were applied on the top of two types of word embeddings (Vord2Vec continuous bag-of-words with negative sampling and FastText). Both traditional and deep learning approaches had to solve the positive/negative/neutral sentiment classification task on the balanced and full dataset versions. The best deep learning results (reaching 0.706 of accuracy) were achieved on the full dataset with CNN applied on top of the FastText embeddings, replaced emoticons, and eliminated diacritics. The traditional machine learning approaches demonstrated the best performance (0.735 of accuracy) on the full dataset with the NBM method, replaced emoticons, restored diacritics, and lemma unigrams as features. Although traditional machine learning approaches were superior when compared to the deep learning methods; deep learning demonstrated good results when applied on the small datasets.

Highlights

  • Internet has changed the ways how people express their beliefs and sentiments about products or services, events, topics, interactions, etc

  • The results of sentiment analysis are mixed, deep learning has become the dominant paradigm recently. Due to this reason in this research, we investigate an impact of the deep learning approaches for the sentiment analysis task

  • For Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), we present the highest accuracy values among these types of embeddings: continuous bag-of-words of 300 dimensions with the negative sampling (Word2Vec) and the FastText

Read more

Summary

Introduction

Internet has changed the ways how people express their beliefs and sentiments about products or services, events, topics, interactions, etc. It is mainly done via social networks, review websites, web forums, blogs, and Internet comments. These texts are sentiment rich, and are beneficial for companies or individuals willing to improve their product marketing strategies and respond . Sentiment analysis (or classification) methods can be grouped into two main categories: dictionary-based and machine learning. The dictionary-based methods (such as [12,13,14,15]) typically rely

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call