Abstract

Sentiment analysis is widely used in a variety of applications such as online opinion gathering for policy directives in government, monitoring of customers, and staff satisfactions in corporate bodies, in politics and security structures for public tension monitoring, and so on. In recent times, the field met with new set of challenges where new algorithms have to contend with highly unstructured sources for sentiment expressions emanating from online social media fora. In this study, a rule and lexical-based procedure is proposed together with unsupervised machine learning to implement sentiment analysis with an improved generalization ability across different sources. To deal with sources devoid of syntactic and grammatical structure, the approach incorporates a ruled-based technique for emoticon detection, word contraction expansion, noise removal, and lexicon-based text preprocessing using lexical features such as part of speech (POS), stop words, and lemmatization for local context analysis. A text is broken into number of tokens with each representing a sentence and then lexicon-dependent features are extracted from each token. The features are merged together using a combining function for a given text before being used to train a machine learning classifier. The proposed combining functions leverage on averaging and information gain concepts. Experimental results with different machine leaning classifiers indicate that improved performance with great deal of generalization capacity across both structured and nonstructured sources can be realized. The finding shows that carefully designed lexical features reinforce learning process in unsupervised learning more than using word embeddings alone as the features. Obtained experimental results from movie review dataset (recall = 74.9%, precision = 70.9%, F1-score = 72.9%, and accuracy = 72.0%) and twitter samples’ datasets (recall = 93.4%, precision = 89.5%, F1-score = 91.4%, and accuracy = 91.1%) show the efficacy of the proposed approach in comparison with other state-of-the-art research studies.

Highlights

  • Sentiment analysis is a part of natural language processing (NLP) which receives tremendous attention in recent history. is may not be unconnected to the availability of social media platforms, big data storage, increased Internet connectivity, accessibility, and unending desire by big business and governments to understand people’s opinions for policy conceptualizations and monitoring

  • The major task in sentiment analysis has to do with tagging a given text according to expressed opinion which usually involves three tasks: (i) determine objectivity of a text, (ii) determine the polarity of a subjective text, and (iii) determine the strength of the subjective text [1]. ere are two major approaches that exist in the literature for sentiment analysis: lexicon-based and machine learning-based approach

  • To validate the efficacy of the proposed approach, four performance matrices were used with two datasets which have different orientations in terms of structure and mode of expression. e matrices include Precision, Recall, F1measure, and Accuracy. e precision matric is the fraction True Positive results out of the total positive results predicted by the classifier, and it provides a probabilistic measure of how a positive opinion is predicted. e recall metric is the fraction of the True Positive results out of the total positive results, therein the gold-standard ground-truth benchmark. e last parameter, F-measure, is the harmonic mean between the recall and precision as expressed in equations 3–6: recall TP, (3)

Read more

Summary

Introduction

Sentiment analysis is a part of natural language processing (NLP) which receives tremendous attention in recent history. is may not be unconnected to the availability of social media platforms, big data storage, increased Internet connectivity, accessibility, and unending desire by big business and governments to understand people’s opinions for policy conceptualizations and monitoring. Ere are two major approaches that exist in the literature for sentiment analysis: lexicon-based and machine learning-based approach. The major task in sentiment analysis has to do with tagging a given text according to expressed opinion which usually involves three tasks: (i) determine objectivity of a text (i.e., subjective or objective), (ii) determine the polarity of a subjective text (i.e., positive or negative), and (iii) determine the strength of the subjective text [1]. Each of these approaches has their benefits and drawbacks. Lexicon-based approach is a rule-based method which employs computing sentiments by considering the semantic orientation of the words or phrases in the text [1]. Is implies the use of a dictionary of words which are tagged with lexical features such as sentiment polarity orientation, part of speech (POS), and glosses. The approach represents a piece of word as a token or a bag of words where semantic orientation of each

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call