Customers use reviews as a primary source of information to judge a product or service. Positive reviews help boost companies’ reputations, increasing their revenue by attracting new clients, and increasing the purchasing order size. On the other hand, negative reviews significantly reduce sales, which might be the case due to competitive advantage. Organizations can use fake (i.e., misleading or fraudulent) reviews to generate fast profits by deceiving customers into buying their products. Recently, various methods to assess the legitimacy of reviews have been introduced using advances in machine learning. However, existing methods fall short of achieving highly accurate detection results for unbalanced classes. We aimed to create a spam review identification model using ensemble-based learning while balancing classes using sampling techniques. This article proposes a weighted stacking ensemble model with sampling (WSEM-S) for efficient fake reviews detection. We used <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$n$</tex-math> </inline-formula> -gram models to effectively model language data for feature retrieval. The experimental results on three customer reviews datasets: YELPNYC, Deceptive Opinion Spam Corpus (DOSC) v1.4, and Deception datasets show that the proposed model outperforms the conventional machine learning techniques Naïve Bayes, logistic regression, K-nearest neighbor (KNN), random forest, extreme gradient boosting (XGBoost), and convolutional neural network (CNN) as well as the state-of-the-art ensemble models.
Read full abstract