Sentiment Analysis in Turkish Based on Weighted Word Embeddings

Aytuğ Onan

doi:10.1109/siu49456.2020.9302182

Abstract

In the era of big data, natural language processing becomes an important research discipline, owing to the immense quantity of text documents and the progresses in machine learning. Natural language processing has been succesfully employed in many different areas, including machine translation, search engines, virtual assistants, spam filtering, question answering and sentiment analysis. Recent studies in the field of natural language processing indicate that word embedding based representation, in which words have been represented in dense spaces through fixed length vectors, can yield promising results. In this study, we evaluate the predictive performance of 36 word embedding based representation obtained by three word embedding methods (i.e., word2vec, fastText and DOC2vec), two basic weighting functions (i.e., inverse document frequency and smooth inverse document frequency) and three vector pooling schemes (namely, weighted sum, center based approach and delta rule). Experimental analysis indicates that word2vec based representation in conjunction with inverse document frequency based weighting and center based pooling, yields promising results for sentiment analysis in Turkish.

Full Text