A Multi-label Classification System to Distinguish among Fake, Satirical, Objective and Legitimate News in Brazilian Portuguese

Janaína Ignacio De Morais,Sylvio Barbon Jr,André Azevedo Da Fonseca,Gabriel Marques Tavares,Hugo Queiroz Abonizio

doi:10.5753/isys.2020.833

Janaína Ignacio De Morais, Sylvio Barbon Jr + Show 3 more

Open Access

PDF Available

https://doi.org/10.5753/isys.2020.833

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Currently, there has been a significant increase in the diffusion of fake news worldwide, especially the political class, where the possible misinformation that can be propagated, appearing at the elections debates around the world. However, news with a recreational purpose, such as satirical news, is often confused with objective fake news. In this work, we decided to address the differences between objectivity and legitimacy of news documents, where each article is treated as belonging to two conceptual classes: objective/satirical and legitimate/fake. Therefore, we propose a DSS (Decision Support System) based on a Text Mining (TM) pipeline with a set of novel textual features using multi-label methods for classifying news articles on these two domains. For this, a set of multi-label methods was evaluated with a combination of different base classifiers and then compared with a multi-class approach. Also, a set of real-life news data was collected from several Brazilian news portals for these experiments. Results obtained reported our DSS as adequate (0.80 f1-score) when addressing the scenario of misleading news, challenging the multi-label perspective, where the multi-class methods (0.01 f1-score) overcome by the proposed method. Moreover, it was analyzed how each stylometric features group used in the experiments influences the result aiming to discover if a particular group is more relevant than others. As a result, it was noted that the complexity group of features could be more relevant than others.

Highlights

Nowadays, the way of consuming, interpret, and process citizens’ news has changed substantially, mainly due to the ease of communication existing on social networks
That indicates that Random Forest was the machine learning algorithm with the best result for both multi-class and multi-label approaches, appearing in the third place with Binary Relevance as its base classifier
That is a reasonable result because Random Forest (RF) is an ensemble approach that creates random weak classifiers, which in turn vote for iSys: Revista Brasileira de Sistemas de Informacao https://sol.sbc.org.br/journals/index.php/isys the final decision, making the model avoid overfitting and robust to outliers and noise [Breiman 2001]

Summary

Introduction

The way of consuming, interpret, and process citizens’ news has changed substantially, mainly due to the ease of communication existing on social networks. ISys: Revista Brasileira de Sistemas de Informacao (Brazilian Journal of Information Systems), 13(4), 126-149. This tradition was not free from ideological biases or even deliberate distortions from various factors - from pure sensationalism, which aimed to increase sales (or audience) through exaggerated emotional appeal; to the overpowering or defamatory news commissioned by politicians interested in interfering in public opinion; or even mere negligence in the newsgathering, the result of unfavorable working conditions in a context of industrial production of journalism

Methods

Results

Conclusion