Detecting Persuasion Attempts on Social Networks: Unearthing the Potential of Loss Functions and Text Pre-Processing in Imbalanced Data Settings

Rúben Teimas,José Saias

doi:10.3390/electronics12214447

Abstract

The rise of social networks and the increasing amount of time people spend on them have created a perfect place for the dissemination of false narratives, propaganda, and manipulated content. In order to prevent the spread of disinformation, content moderation is needed. However, manual moderation is unfeasible due to the large amount of daily posts. This paper studies the impact of using different loss functions on a multi-label classification problem with an imbalanced dataset, consisting of 20 persuasion techniques and only 950 samples, provided by SemEval’s 2021 Task 6. We used machine learning models, such as Naive Bayes and Decision Trees, and a custom deep learning architecture, based on DistilBERT and Convolutional Layers. Overall, the machine learning models achieved far worse results than the deep learning model, using Binary Cross Entropy, which we considered our baseline deep learning model. To address the class imbalance problem, we trained our model using different loss functions, such as Focal Loss and Asymmetric Loss. The latter providing the best results, particularly for the least represented classes.

Full Text