Performance Study of N-grams in the Analysis of Sentiments

O. E. Ojo,O. O. Adebanji,H. Calvo,A. Gelbukh

doi:10.46481/jnsps.2021.201

Abstract

In this work, a study investigation was carried out using n-grams to classify sentiments with different machine learning and deep learning methods. We used this approach, which combines existing techniques, with the problem of predicting sequence tags to understand the advantages and problems confronted with using unigrams, bigrams and trigrams to analyse economic texts. Our study aims to fill the gap by evaluating the performance of these n-grams features on different texts in the economic domain using nine sentiment analysis techniques and found more insights. We show that by comparing the performance of these features on different datasets and using multiple learning techniques, we extracted useful intelligence. The evaluation involves assessing the precision, recall, f1-score and accuracy of the function output of the several machine learning algorithms proposed. The methods were tested using Amazon, IMDB, Reuters, and Yelp economic review datasets and our comprehensive experiment shows the effectiveness of n-grams in the analysis of sentiments.

Highlights

IntroductionBackground and Related WorkDifferent works have been carried out in the field of sentiment analysis [1, 7, 3, 4]
Background and Related WorkDifferent works have been carried out in the field of sentiment analysis [1, 7, 3, 4]
We identified some common techniques used in recent studies [6, 2, 4, 16], namely Decision Tree Classifier (DTC), Gradient Boosting Classifier (GBC), Naive Bayes Algorithm (NBA) and Random Forest Classifier (RFC)

Summary

Introduction

Background and Related WorkDifferent works have been carried out in the field of sentiment analysis [1, 7, 3, 4]. Machine learning techniques have shown good results in analysing sentiments in text [6, 2, 4] and other tasks such as part of speech recognition (PoS) [9], named entity recognition (NER) [10], etc Linear statistical models, such as random-field (CRF) and Hidden Markov (HMM) fields, are NLP approaches used for sequence tagging with a long history of excellent performance. The combination of categorical grammar, annotation, acquisition of lexicons and semantic networks was used by Pekka et al [11] to analyze the feelings of the text and to define the tags of the text They investigated how the overall phrase structured data and domain-specific language usage could aid in the detection of semantic orientations in financial and economic news

Objectives

Methods

Results

Conclusion