Online Social Media platforms (OSMs) have become an essential source of information. The high speed at which OSM users submit data makes moderation extremely hard. Consequently, besides offering online networking to users, the OSMs have also become carriers for spreading fake news. Knowingly or unknowingly, users circulate fake news on OSMs, adversely affecting an individual’s offline activity. To counter fake news, several dedicated websites (referred to as fact-checkers) have sprung up whose sole purpose is to identify and report fake news incidents. There are well-known datasets of fake news; however, not much work has been done regarding credible datasets of fake news in India. Therefore, we design an automated data collection pipeline to collect fake incidents reported by fact-checkers in this work. We gather 4,803 fake news incidents from June 2016 to December 2019 reported by six popular fact-checking websites in India and make this dataset (FakeNewsIndia) available to the research community. We find 5,031 tweets on Twitter and 866 videos on YouTube mentioned in these 4,803 fake news incidents. Further, we evaluate the impact of fake new incidents on the two prominent OSM platforms, namely, Twitter and YouTube. We use popularity metrics based on engagement rate and likes ratio to measure impact and categorize impact into three levels — low, medium, and high. Our learning models use features extracted from text, images, and videos present in the fake news incident articles written by fact-checking websites. Experiments show that we can predict the impact (popularity) of videos (appearing on fake news incident articles) on YouTube more accurately (with baseline accuracy ranging from 86% to 92%) as compared to the impact (popularity) of tweets on Twitter (with baseline accuracy of 37% to 41%). We need to build more intelligent models that predict tweets’ impact, appearing in fact-checking incident articles on Twitter as future work.
Read full abstract