The Influence of Word Embeddings on the Performance of Sentiment Classification

Rong Huang,Qianyi Chen,Jun Tang,Jianjie Song

doi:10.56028/ijcit.1.4.1.2023

Abstract

Word embeddings are widely used in natural language processing for mapping words into a numerical representation in vector space. Their quality can be influenced by a variety of factors such as training methods and corpus, which in turn impact machine learning performance. As a whole, larger corpora result in higher-quality word embeddings and improved classification accuracy when the training method is same and the corpus is different. However, the content of the corpus will also affect the classification performance. In this work, we study the relationship between several common word embeddings and sentiment classification models through a series of comparative experiments. Comparison results reveal that in addition to the training method and corpus size, the corpus content and dimensionality also play a significant role in determining the quality of word embeddings. Therefore, when dealing with specific tasks, it is necessary to comprehensively consider these factors, so as to obtain better results. This work provides an improved understanding of factors for consideration that may lead to more efficient sentiment classification.

Full Text