Abstract

This work belongs to the field of sentiment analysis; in particular, to opinion and emotion classification using a lexicon-based approach. It solves several problems related to increasing the effectiveness of opinion classification. The first problem is related to lexicon labelling. Human labelling in the field of emotions is often too subjective and ambiguous, and so the possibility of replacement by automatic labelling is examined. This paper offers experimental results using a nature-inspired algorithm—particle swarm optimization—for labelling. This optimization method repeatedly labels all words in a lexicon and evaluates the effectiveness of opinion classification using the lexicon until the optimal labels for words in the lexicon are found. The second problem is that the opinion classification of texts which do not contain words from the lexicon cannot be successfully done using the lexicon-based approach. Therefore, an auxiliary approach, based on a machine learning method, is integrated into the method. This hybrid approach is able to classify more than 99% of texts and achieves better results than the original lexicon-based approach. The final hybrid model can be used for emotion analysis in human–robot interactions.

Highlights

  • Online discussions generate a huge amount of data every day, which are hard to process manually by a human

  • The main purpose of this paper is to find the best method for labelling a lexicon for a lexicon-based approach to opinion classification

  • It is natural that our baseline was the lexicon-based approach, not a machine learning approach

Read more

Summary

Introduction

Online discussions generate a huge amount of data every day, which are hard to process manually by a human. We extend the lexicon approach with a machine learning module, in order to classify unclassified posts using the lexicon-based approach This module was trained on training data labelled using a lexicon approach for opinion classification. PSO optimizes the values of opinion polarity for all words in a labelled lexicon, where the fitness function is represented by the effectiveness measure of sentiment analysis using the labelled lexicon. This automatic labelling avoids the subjectivity of human labelling. We present a hybrid approach which integrates a machine learning model into the sentiment analysis method, in order to classify texts not containing words in the lexicons. It is generally assumed that deep learning can achieve better results than the Naive Bayes method in the field of text processing

Related Works
Nature-Inspired Optimization
Naive Bayes Learning Method
Lexicon Generation
Lexicon Labelling Using Particle Swarm Optimization
Labelling by Bare-Bones Particle Swarm Optimization
Fitness Function for Optimization
Data Sets
Experiments with PSO and BBPSO Labelling
Negative
Comparison of PSO and BBPSO Labelling with Human Labelling
Labelling
Used Methods
Distribution of Values of Polarities in Generated Lexicons
NewThe
Topic Identification in Opinion Classification
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call