Sentiment analysis and the complex natural language

Muhammad Taimoor Khan,Mehr Durrani,Kamran Habib Khan,Shehzad Khalid,Irum Inayat,Armughan Ali

doi:10.1186/s40294-016-0016-9

Abstract

AbstractThere is huge amount of content produced online by amateur authors, covering a large variety of topics. Sentiment analysis (SA) extracts and aggregates users’ sentiments towards a target entity. Machine learning (ML) techniques are frequently used as the natural language data is in abundance and has definite patterns. ML techniques adapt to domain specific solution at high accuracy depending upon the feature set used. The lexicon-based techniques, using external dictionary, are independent of data to prevent overfitting but they miss context too in specialized domains. Corpus-based statistical techniques require large data to stabilize. Complex network based techniques are highly resourceful, preserving order, proximity, context and relationships. Recent applications developed incorporate the platform specific structural information i.e. meta-data. New sub-domains are introduced as influence analysis, bias analysis, and data leakage analysis. The nature of data is also evolving where transcribed customer-agent phone conversation are also used for sentiment analysis. This paper reviews sentiment analysis techniques and highlight the need to address natural language processing (NLP) specific open challenges. Without resolving the complex NLP challenges, ML techniques cannot make considerable advancements. The open issues and challenges in the area are discussed, stressing on the need of standard datasets and evaluation methodology. It also emphasized on the need of better language models that could capture context and proximity.

Highlights

Sentiment analysis (Pang and Lillian 2008) is a type of text classification that deals with subjective statements
It is known as opinion mining, since it processes opinions in order to learn about public perception
Opinion mining has its boundaries extended from computer science to management sciences

Summary

Introduction

Sentiment analysis (Pang and Lillian 2008) is a type of text classification that deals with subjective statements. The classification of sentiments in a review document is performed through identifying and separating all the positive and negative opinion words. Unsupervised techniques The unsupervised sentiment analysis techniques do not require training data and rather rely on semantic orientation They make use of lexicons to identify the positive or negative semantics of opinion words. Statistical analysis techniques are unsupervised, identifying the orientation of sentiment words through statistical evaluations They require large volume of data for high accuracy. Some of the challenges are common to opinion mining in general while others are related to their own sources and context depending upon the domain of the dataset These issues affect the performance of machine learning techniques, but it has little control on them. In spite of having the word not, the meaning of the sentence is not inverted as is normally the case with the negation word

Findings

Discussion

Conclusion