Text Representation for Sentiment Analysis: From Static to Dynamic

Prashantkumar M Gavali,Suresh K Shiragave

doi:10.1109/icsmdi57622.2023.00025

Abstract

Text representation in a vector, known as embedding, is crucial for various classification tasks including sentiment analysis. It helps to process and understand natural language text more effectively. It has evolved from static approaches, such as bag-of-words and n-grams, to more dynamic approaches that consider the context and meaning of words, such as word embeddings and contextualized embeddings. Word embeddings use neural networks to learn vector representations of words based on their co-occurrence patterns in large text corpora. On the other hand, contextualized embeddings, such as BERT, consider the context of each word within a sentence or document to generate more nuanced representations. Numerous researchers have suggested modifying the original Word2Vec and BERT embeddings to include sentiment information. This paper provides a comprehensive overview of these methods by including a detailed discussion of various evaluation techniques. The paper also outlines several challenges related to embeddings that can be addressed in order to improve the results of sentiment analysis.

Full Text