The PolitiFact-Oslo Corpus: A New Dataset for Fake News Analysis and Detection

Nele Põldvere,Zia Uddin,Aleena Thomas

doi:10.3390/info14120627

Nele Põldvere, Zia Uddin + Show 1 more

Open Access

https://doi.org/10.3390/info14120627

Copy DOI

Journal: Information	Publication Date: Nov 23, 2023
Citations: 2	License type: CC BY 4.0

Affiliation: University of Oslo, SINTEF

Abstract

This study presents a new dataset for fake news analysis and detection, namely, the PolitiFact-Oslo Corpus. The corpus contains samples of both fake and real news in English, collected from the fact-checking website PolitiFact.com. It grew out of a need for a more controlled and effective dataset for fake news analysis and detection model development based on recent events. Three features make it uniquely placed for this: (i) the texts have been individually labelled for veracity by experts, (ii) they are complete texts that strictly correspond to the claims in question, and (iii) they are accompanied by important metadata such as text type (e.g., social media, news and blog). In relation to this, we present a pipeline for collecting quality data from major fact-checking websites, a procedure which can be replicated in future corpus building efforts. An exploratory analysis based on sentiment and part-of-speech information reveals interesting differences between fake and real news as well as between text types, thus highlighting the importance of adding contextual information to fake news corpora. Since the main application of the PolitiFact-Oslo Corpus is in automatic fake news detection, we critically examine the applicability of the corpus and another PolitiFact dataset built based on less strict criteria for various deep learning-based efficient approaches, such as Bidirectional Long Short-Term Memory (Bi-LSTM), LSTM fine-tuned transformers such as Bidirectional Encoder Representations from Transformers (BERT) and RoBERTa, and XLNet.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The PolitiFact-Oslo Corpus: A New Dataset for Fake News Analysis and Detection

Abstract

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

BERT Model for Fake News Detection Based on Social Bot Activities in the COVID-19 Pandemic
Maryam Heidari ... Parisa Hajibabaee
-
Maryam Heidari, et. al.Maryam Heidari ... Parisa Hajibabaee
01 Dec 2021
01 Dec 2021

Fake news detection on Twitter
Srishti Sharma ... Anil Kumar Dubey
International Journal of Web Information Systems | VOL. 18
Srishti Sharma, et. al.Srishti Sharma ... Anil Kumar Dubey
19 Sep 2022
International Journal of Web Information Systems | VOL. 18

Detection of Fake News Using Transformer Model
Momina Qazi ... Mazhar Ali
-
Momina Qazi, et. al.Momina Qazi ... Mazhar Ali
01 Jan 2020
01 Jan 2020

ExBAKE: Automatic Fake News Detection Model Based on Bidirectional Encoder Representations from Transformers (BERT)
Heejung Jwa ... Jang Kang
Applied Sciences | VOL. 9
Heejung Jwa, et. al.Heejung Jwa ... Jang Kang
28 Sep 2019
Applied Sciences | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The PolitiFact-Oslo Corpus: A New Dataset for Fake News Analysis and Detection

Abstract

Talk to us

Similar Papers

More From: Information