Abstract

This research aims to test the performance of Neighbor Weighted K-Nearest Neighbor (NWKNN) in handling imbalanced datasets in the case of aspect-based sentiment analysis. The data used in this research are beauty product reviews from the Kaggel site. Data obtained from 2,449 reviews. Every product review before entering the classification stage, goes through preprocessing. In this research, the preprocessing stages consist of casefolding, cleaning, tokenization, normalization, stemming, convert negation, and stopword removal processes. So that the preprocessing results can be processed by a classification algorithm, each review that has been preprocessed is included in feature extraction. The feature extraction method used in this research is TF-IDF. The results of feature extraction are included in the classification process. In this research, each review went through a classification process several times. Because in this research, multilabel handling uses binary relevance techniques. Each classification uses NWKNN. Classification was carried out four times according to the aspects used in this research, namely: price, packaging, effectiveness and aroma. So each classification produces a polarity for each aspect, namely: positive, negative, or non-sentimental. The results of performance testing with Confusion Matrix showed that NWKNN's performance was higher than KNN's for each aspect, in terms of f1-score. Where the optimal e and k values for the NWKNN method are k=40 and e=2. This shows that NWKNN proves to perform better when the dataset is imbalanced compared to KNN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call