Abstract

Since the outbreak of COVID-19 in Wuhan in 2019, Sina Weibo has become a platform for rumors to spread rapidly with more than 200 million daily active users. Based on this, we focus on the epidemic-related rumors on Sina Weibo during the epidemic. By summarizing the features extracted by feature engineering in the research of rumor detection from 2011 to now, we find that the existing features have low utilization rate of the user's geographical location property. Although the public display of the user's IP geographical location in Sina Weibo since April 28, 2022 has greatly improved the authenticity of this property, its utilization is still limited to extracting the explicit feature of IP geographical location. Moreover, the existing research has not paid attention to the filtering mechanism of the platform when extracting features, and has taken advantage of this phenomenon.According to the shortcomings of the existing research, we put forward two new indicators based on the filtering mechanism of Sina Weibo, the available comments and the first-class available comments. The available comments represent comments in the comment area of each microblog that have not been filtered by the filtering mechanism. The first-class available comments represent available comments that directly reply to the original microblog. Based on the new indicator of first-class available comments and users’ IP geographical location property, we further put forward 10 new features. Finally, we train seven classifiers, including logistic regression, SVM based on linear kernel, SVM based on RBF kernel, Naive Bayes, random forest, DNN and CNN. By comparing the prediction effects of each classifier and the prediction effects of the classifier before and after adding new features, it is concluded that the random forest classifier performed best in this application scenario, and the new features can improve the accuracy of random forest classifier by 1.33%, the precision rate of positive cases by 2%, the precision rate of negative cases by 5%, the recall rate of positive cases unchanged and the recall rate of negative cases by 14%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call