Exploring COVID-related relationship extraction: Contrasting data sources and analyzing misinformation

Tanvi Sharma,Amer Farea,Nadeesha Perera,Frank Emmert-Streib

doi:10.1016/j.heliyon.2024.e26973

Tanvi Sharma, Amer Farea + Show 2 more

Open Access

https://doi.org/10.1016/j.heliyon.2024.e26973

Copy DOI

Journal: Heliyon	Publication Date: Feb 28, 2024
License type: cc-by

Affiliation: Tampere University

Abstract

The COVID-19 pandemic presented an unparalleled challenge to global healthcare systems. A central issue revolves around the urgent need to swiftly amass critical biological and medical knowledge concerning the disease, its treatment, and containment. Remarkably, text data remains an underutilized resource in this context. In this paper, we delve into the extraction of COVID-related relations using transformer-based language models, including Bidirectional Encoder Representations from Transformers (BERT) and DistilBERT. Our analysis scrutinizes the performance of five language models, comparing information from both PubMed and Reddit, and assessing their ability to make novel predictions, including the detection of “misinformation.” Key findings reveal that, despite inherent differences, both PubMed and Reddit data contain remarkably similar information, suggesting that Reddit can serve as a valuable resource for rapidly acquiring information during times of crisis. Furthermore, our results demonstrate that language models can unveil previously unseen entities and relations, a crucial aspect in identifying instances of misinformation.

Full Text