A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

Vasileios Athanasiou,Manolis Maragoudakis

doi:10.3390/a10010034

Abstract

Sentiment analysis has played a primary role in text classification. It is an undoubted fact that some years ago, textual information was spreading in manageable rates; however, nowadays, such information has overcome even the most ambiguous expectations and constantly grows within seconds. It is therefore quite complex to cope with the vast amount of textual data particularly if we also take the incremental production speed into account. Social media, e-commerce, news articles, comments and opinions are broadcasted on a daily basis. A rational solution, in order to handle the abundance of data, would be to build automated information processing systems, for analyzing and extracting meaningful patterns from text. The present paper focuses on sentiment analysis applied in Greek texts. Thus far, there is no wide availability of natural language processing tools for Modern Greek. Hence, a thorough analysis of Greek, from the lexical to the syntactical level, is difficult to perform. This paper attempts a different approach, based on the proven capabilities of gradient boosting, a well-known technique for dealing with high-dimensional data. The main rationale is that since English has dominated the area of preprocessing tools and there are also quite reliable translation services, we could exploit them to transform Greek tokens into English, thus assuring the precision of the translation, since the translation of large texts is not always reliable and meaningful. The new feature set of English tokens is augmented with the original set of Greek, consequently producing a high dimensional dataset that poses certain difficulties for any traditional classifier. Accordingly, we apply gradient boosting machines, an ensemble algorithm that can learn with different loss functions providing the ability to work efficiently with high dimensional data. Moreover, for the task at hand, we deal with a class imbalance issues since the distribution of sentiments in real-world applications often displays issues of inequality. For example, in political forums or electronic discussions about immigration or religion, negative comments overwhelm the positive ones. The class imbalance problem was confronted using a hybrid technique that performs a variation of under-sampling the majority class and over-sampling the minority class, respectively. Experimental results, considering different settings, such as translation of tokens against translation of sentences, consideration of limited Greek text preprocessing and omission of the translation phase, demonstrated that the proposed gradient boosting framework can effectively cope with both high-dimensional and imbalanced datasets and performs significantly better than a plethora of traditional machine learning classification approaches in terms of precision and recall measures.

Highlights

The way that machines and humans search, retrieve and manage information changes at a high rate
The presented research, which is an extension of [54], studied the use of machine learning in sentiment analysis tasks, in situations where the text is given in a language with a relatively minimal set of linguistic analysis gazetteers and modules, such as the POS tagger, syntactic shallow parsers, etc
Modern Greek has its place to this category of under-resourced languages, and the present work collected real-world sentiment data, obtained from Web 2.0 platforms, and followed an idea of using machine translation of Greek tokens as a feature generation step

Summary

Introduction

The way that machines and humans search, retrieve and manage information changes at a high rate. The potential information resources have increased because of the advent of. User-generated content is available from a large pool of sources, such as online newspapers, blogs, e-commerce sites, social media, etc. The quantity of information is continuously increasing in every domain. It was online retailers, such as Amazon, that identified possible profits by exploiting users’ opinions. Nowadays, almost everyone has realized that quality of services, marketing and maximization of sales cannot be achieved without considering the textual content that is generated by Internet users. The task of identifying relevant information from the vast amount of human communication information over the Internet is of utmost importance for robust sentiment analysis. The existence of opinion data has resulted in the development of

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Algorithms	Publication Date: Mar 6, 2017
Citations: 33	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Similar Papers

Lost in Translation: Viability of Machine Translation for Cross Language Sentiment Analysis
Balamurali A.R ... Mitesh M Khapra
-
Balamurali A.R, et. al.Balamurali A.R ... Mitesh M Khapra
01 Jan 2013
01 Jan 2013

Imbalance Learning and Its Application on Medical Datasets
Yachao Shao
-
Yachao ShaoYachao Shao
21 Feb 2022
21 Feb 2022

Classifying Imbalanced Data Sets by a Novel RE-Sample and Cost-Sensitive Stacked Generalization Method
Jianhong Yan ... Suqing Han
Mathematical Problems in Engineering | VOL. 2018
Jianhong Yan, et. al.Jianhong Yan ... Suqing Han
01 Jan 2018
Mathematical Problems in Engineering | VOL. 2018

Resampling Imbalanced Data and Impact of Attribute Selection Methods in High Dimensional Data
K Ulaga Priya ... S Pushpa
-
K Ulaga Priya, et. al.K Ulaga Priya ... S Pushpa
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms