Outlier Detection on Semantic Space for Sentiment Analysis With Convolutional Neural Networks

Murilo Falleiros Lemos Schmitt,Eduardo J Spinosa

doi:10.1109/ijcnn.2018.8489200

Abstract

Sentiment analysis is a text categorization problem that consists in automatically assigning text documents to pre- defined classes that represent sentiments or a positive/negative opinion about a subject. To solve this task, machine learning techniques can be used. However, in order to achieve good gen- eralization, these techniques require a thorough pre-processing and an apropriate data representation. To deal with these fundamental issues, this work proposes the use of convolutional neural networks and density-based clustering algorithms. The word representations used in this work were obtained from vectors previously trained in an unsupervised way, denominated word embeddings. These representations are able to capture syntactic and semantic information of words, which leads to similar words to be projected closer together in the semantic space. In this scenario, in order to improve the performance of the convolutional neural network, the use of a clustering algorithm in the semantic space to extract additional information from the data is proposed. A density-based clustering algorithm was used to detect and remove outliers from the documents to be classified before these documents were used to train the con- volutional neural network. We conducted experiments with two different embeddings across three datasets in order to validate the effectiveness of our method. Results show that removing outliers from documents is capable of slightly improving the accuracy of the model and reducing computational cost for the non-static training approach. (0)

Full Text