Abstract

This research attempts to identify the most accurate and effective model in performing sentiment analysis on product reviews in marketplaces using preprocessing techniques, word2vec, and CNN. We collected 20,986 reviews from 720 products in a marketplace using scrap method, then cleaned and labeled the data to include 515 positive reviews, 490 negative reviews. We then performed preprocessing on the data using four different scenarios and identified word vector representation using word2vec. Subsequently, we applied the results of word2vec to the CNN architecture to classify sentiment in product reviews. After trying various variations of each technique, we found that a combination of the third preprocessing technique (case folding, punctuation removal, word normalization, and stemming), the second word2vec parameter combination (size 50, window 2, hs 0, and negative 10), and the fourth CNN parameter combination (kernel size 2, dropout 0.2, and learning rate 0.01) had the best accuracy of 99.00%, precision of 98.96%, and recall of 98.96%. We also found that the word normalization technique greatly helped to increase model accuracy by correcting improperly written or incorrect words in the reviews. Based on the evaluation of word2vec, the hs 0 method produced a higher average accuracy compared to the hs 1 method because the hs 0 method used negative sampling which helped the model understand the context of the trained words. In the CNN parameter, higher learning rates can cause the model to learn faster, but can also cause the model to be unstable, while lower learning rates can make the model more stable but can also cause the model's learning process to be slower.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call