Fake News and Imbalanced Data Perspective

Isha Y Agarwal,Dipti P Rana

doi:10.4018/978-1-7998-7371-6.ch011

Abstract

Fake news has grabbed attention lately. In this chapter, the issue is tackled from the point of view of collection of quality data (i.e., instances of fake and real news articles on a balanced distribution of subjects). It is predicted that in the near future, fake news will supersede true news. In the media ecosystem this will create a natural imbalance of data. Due to the unbounded scale and imbalance existence of data, detection of fake news is challenging. The class imbalance problem in fake news is yet to be explored. The problem of imbalance exists as fake news instances increase in some cases more than real news. The goal of this chapter is to demonstrate the effect of class imbalance of real and fake news instances on detection using classification models. This work aims to assist researchers to better resolve the problem by illustrating the precise existence of the relationship between the imbalance and the resulting impact on the output of the classifier. In particular, the authors determine that data imbalance and accuracy are inversely proportional to each other.

Full Text