The conundrum of the ubiquitous deceptive reviews has overruled the online ontology with the obsession of obscure but obligatory posting of product reviews for the customers to believe, behold and beget the online product marketing. This mandates contemporary research in the direction to delve deeper on the application and analysis of deceiving online reviews with matured and advanced AI models functional on large scale datasets to effectively and efficiently demarcate between the genuine and the sham. The research counteracts the counterfeiting product reviews via the applications, assessment and analysis of the befitting AI models - Elastic-net Classifier model based on block coordinate descent with Wordcloud and its further performance enhancement through LightGBM Trees Classifier with Grid Search and Early Stopping support, with Log-Loss as performance metric for experimentation to gain insight into the intricacies of detection, diagnosis and diminution of fake product reviews. The paper also delineates discriminative and affirmative aspects of the dataset quality, statistics, stability and standards inherent and coherent to the creation of the dataset using Large Language Models (LLMs) intrinsic to the zeitgeist juncture of recent times promoting machines to produce large scale, cost effective bogus reviews in lieu of the Amazon Mechanical Turks. The results obtained with the Log-Loss holdout score of 0.1462 conforming the LightGBM classifier proves its performance better than the Elastic-Net classifier, conforming it as better than the ROC-AUC in terms of its proximity to the prediction probability for the matching actual/true value.
Read full abstract