Abstract

As the amount of content that is created on social media is constantly increasing, more and more opinions and sentiments are expressed by people in various subjects. In this respect, sentiment analysis and opinion mining techniques can be valuable for the automatic analysis of huge textual corpora (comments, reviews, tweets etc.). Despite the advances in text mining algorithms, deep learning techniques, and text representation models, the results in such tasks are very good for only a few high-density languages (e.g., English) that possess large training corpora and rich linguistic resources; nevertheless, there is still room for improvement for the other lower-density languages as well. In this direction, the current work employs various language models for representing social media texts and text classifiers in the Greek language, for detecting the polarity of opinions expressed on social media. The experimental results on a related dataset collected by the authors of the current work are promising, since various classifiers based on the language models (naive bayesian, random forests, support vector machines, logistic regression, deep feed-forward neural networks) outperform those of word or sentence-based embeddings (word2vec, GloVe), achieving a classification accuracy of more than 80%. Additionally, a new language model for Greek social media has also been trained on the aforementioned dataset, proving that language models based on domain specific corpora can improve the performance of generic language models by a margin of 2%. Finally, the resulting models are made freely available to the research community.

Highlights

  • The first experiment comparatively evaluates the performance of the different pretrained Greek language models in a binary sentiment prediction task, using only the positive and negative documents from the annotated dataset

  • In order to evaluate the performance of the training dataset size and find the language models that perform better with fewer and more samples, respectively, we kept a held-out set of 2000 annotated documents as a test set and used a size-increasing training dataset for training

  • The experimental evaluation that has been performed in this work has validated the claim that language models can perform better than traditional representation models, such as the vector space model

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Nowadays social media are the biggest repository of public opinion about everything, from places, companies, and persons to products and ideas. This is mainly due to the fact that people prefer to express and share on social media their opinions on daily, local, or global issues and tend to comment on other people’s views, which in turn creates a huge amount of opinionated data. The proper management and analysis of such data may uncover interests, beliefs, trends, risks, and opportunities that are valuable for marketing companies that design campaigns and promote products, persons, and concepts [1]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call