Abstract

Fake reviews may mislead consumers. A large number of fake reviews will even cause huge property losses and public opinion crises. Therefore, it is necessary to detect and filter fake reviews. However, most existing methods have lower accuracy in detecting fake reviews due to they just use single features and lack of labeled experimental data. To solve this problem, we propose a novelty method to detect fake reviews based on multiple feature fusion and rolling collaborative training. First, the method requires an initial index system with multiple features such as text features, sentiment features of reviews and behavior features of reviewers. Second, the method needs an initial training sample set. Thus, we designed related algorithms to extract all the features of a review. Then the classification of the review is labeled manually. Finally, the method uses the initial sample set to train 7 classifiers, and the most accurate classifier will be selected to classify new reviews. The novelty of the method lies in that the features and the classification labels of the new reviews will be added into the initial sample set as new samples. So the size of the sample set will increase automatically. The experimental results in the reviews of yelp shopping website show that the accuracy of the proposed method for detecting fake reviews is 84.45%, which is 3.5% higher than the baseline methods. And compared with the latest deep learning model, its baseline precision has increased by 5.3%. According to the Friedman test, the support vector machine (SVM) classifier and random forest (RF) classifier has been proven to be the best one by statistical means. It means our method which uses multiple features has higher accuracy than the baseline models. Meanwhile, it also resolves the problem of lacking labeled training samples in fake reviews detection.

Highlights

  • For online shopping, there are inconsistencies between products’ information and products that consumers receive offline, which leads consumers to read a large amount of reviews of target products to assist judgement [1]

  • The method analyzes the relationship between each feature, formulates a review credibility evaluation index system, designs feature extraction and quantification methods, and constructs a fake review detection model based on multi-feature fusion and rolling collaborative training

  • This article uses support vector machine (SVM), decision tree (DT) and random forest (RF) classifiers with better classification results, compares the effects of the three classifiers, and selects the RF classifier with the best classification effect as the control group; Semi-supervised [45]: a kind of semi-supervised learning, based on a single classifier for reinforcement learning, the classifier chooses RF; Co-training: standard collaborative training algorithm, using the original feature set without any processing as input for model training; Co-training: The method proposed in this article adds text representation features and sentiment features based on original features, and trains the classifier by rolling update of the sample set

Read more

Summary

INTRODUCTION

There are inconsistencies between products’ information and products that consumers receive offline, which leads consumers to read a large amount of reviews of target products to assist judgement [1]. The method analyzes the relationship between each feature, formulates a review credibility evaluation index system, designs feature extraction and quantification methods, and constructs a fake review detection model based on multi-feature fusion and rolling collaborative training. Delete irrelevant data, and build a classification model for fake review detection based on multi-feature fusion and rolling collaborative training (see section III.D). PURPOSE OF THE EXPERIMENT In this study, review texts in the field of e-commerce are used as the experimental data set to test the calculation method of sentiment intensity and the validity of the Doc2vec text representation network model. EXPERIMENTAL PLATFORM The algorithm used in this research uses the server operating environment as Win; processor Intel (R) Core (TM) i5-5200U CPU @ 2.20GHz 2.20GHz; running memory 8G; Python 3.7.0 version; Tensor Flow 1.13.1 version; Gensim 3.8.0 version; Scikit-learn 0.20.1 version; Text segmentation and part-of-speech tagging are performed using NLTK tools

EXPERIMENTAL SETUP AND RESULT ANALYSIS
Findings
DISCUSSION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call