A semi-supervised model for Persian rumor verification based on content information

Zoleikha Jahanbakhsh-Nagadeh,Mohammad-Reza Feizi-Derakhshi,Arash Sharifi

doi:10.1007/s11042-020-10077-3

Abstract

Rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. In social networks, false-rumors may have significantly different contextual characteristics from true-rumors at lexical, syntactic, semantic levels. Therefore, this study presents the BERT-SAWS semi-supervised learning model for early verification of Persian rumor by investigating content-based and context features at three views: Contextual Word Embeddings (CWE), speech act, and Writing Style (WS). This model is built by loading pre-trained Bidirectional Encoder Representations from Transformers (BERT) as an unsupervised language representation, fine-tuning it using a small Persian rumor dataset, and combining with a supervised learning model to provide an enriched text representation of the content of the rumor. This text representation enables the model to have a better comprehending of the rumor language to verify rumors better than baseline models for two reasons: (i) early rumor verification by focusing on content-based and context-based features of the source rumor. (ii) overcoming the problem of the shortcoming of the dataset in deep neural networks by loading pre-trained BERT, fine-tuning it using the Persian rumor dataset, and combining with speech act and WS-based features. The empirical results of applying the model on Twitter and Telegram datasets demonstrated that BERT-SAWS can enhance the performance of the classifier from 2% to 18%. It indicates that speech act and WS alongside semantic contextual vectors are helpful features in the rumor verification task.

Full Text