Abstract
With the growth of online review sites, we can use the opportunity to find out what other people think. Sentiment analysis is a popular text classification task in the data mining domain. In this research, we develop a text classification machine learning solution that can classify customer reviews into positive or negative class by predicting the overall sentiment of the review text on a document level. For this purpose, we carry out the experiment with two approaches, i.e. traditional machine learning approach and deep learning. In the first approach, we utilize four traditional machine learning algorithms with TF-IDF model using n-gram approach. These classifiers are multinomial naive Bayes, logistic regression, k-nearest neighbour and random forest. Out of these four classifiers, logistic regression achieves the highest accuracy. The second approach is to utilize deep learning methodologies with the word2vec approach, for which we develop a sequential deep learning neural network. The accuracy we achieve with deep learning is much lower than our traditional machine learning approach. After finding out the best performing approach and the classifier, the next step of the work is to build our final model with logistic regression using some advanced machine learning methodologies, i.e. Synthetic Minority Over-sampling for data balancing issues, Shuffle Split cross-validation. The accuracy of the final logistic regression model is approximately 87% which is 3% higher from the initial experimentation. Our finding in this research work is that, in smaller dataset scenarios, traditional machine learning would outperform deep learning models in terms of accuracy and other evaluation metrics. Another finding in this work is, by addressing data balancing issues in the dataset the accuracy of the model can be improved.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have