Abstract

Text classification is considered as one of the primary task in many Natural Language Processing (NLP) applications. In industrial applications of NLP, sentimental analysis is a task to understand how satisfied a user is after receiving a service or buying a product. The traditional approach is to convert a text into a format of numeric vector before feeding into machine learning algorithm. This representation of a word refers to word embedding. However the traditional embedding methods often model the syntactic context of words but ignore the sentiment information of text [1]. This can impact on the accuracy of a classification model to predict the correct sentimental score for a text. In this paper, we present Sent2Vec, an alternative embedding representation that includes the sentimental semantic of a sentence in its embedding vector. We utilized the unsupervised Smoothed Inverse Frequency (uSIF) sentence embedding method in the Sent2Vec neural network over a multi million samples dataset. The new sentence embedding presented, can be used as features in downstream (un)supervised tasks, which also leads to better or comparable results compared to sophisticated methods. Furthermore, with a simple logistic regression classifier, Sent2Vec reaches competitive performance to state-of-the-art results on several datasets when combined with GloVe(6B).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call