A Comprehensive Survey of Sentiment Analysis: Word Embeddings Approach, Research Challenges and Opportunities

Shameer Bashir,Arvind Selwal

doi:10.2139/ssrn.3883875

Abstract

In this paper, we present a review of sentiment analysis along with the concept of word embeddings, natural language processing and crucial aspects that are essential for model for sentiment analysis. First, we discuss the basic steps for collecting the corpus of sentences from different sentiments or opinions such as movie reviews, Twitter data, and etc., Usually, the corpus of data is optimized with regard to the only the desirable or subjective part of the sentence and it is retained whereas the rest is discarded. The resultant dataset is split into train and test segments in a balanced ratio so that the problem of overfitting of classifiers is overcome. We also discuss various techniques such as Bag-Of-Words, TF/IDF, Word Embedding like word2vec, BERT (Latest one), etc. to convert the entire dataset into machine-readable form i.e., numerical. These vectors are fed as input to the machine learning or deep learning classifiers to predict the polarity of the subjective sentences. In our study we explain the techniques from very basic bag-of-words to the latest word embeddings BERT. In the last, we identified few research issues that are open to research community in this active field of sentiment analysis. One of the major challenges is to design a domain independent model for sentiment analysis using word embeddings. Further, an additional issue is related to use word embedding for translation in the local languages such as Dogri, Kashmiri, and many more.

Full Text