Covid-on-the-Web: Exploring the COVID-19 scientific literature through visualization of linked data from entity and argument mining

Aline Menin,Fabien Gandon,Elena Cabrio,Alain Giboin,Olivier Corby,Tobias Mayer,Serena Villata,Franck Michel,Santiago Marro,Marco Winckler,Raphaël Gazzotti

doi:10.1162/qss_a_00164

Abstract

Abstract The unprecedented mobilization of scientists caused by the COVID-19 pandemic has generated an enormous number of scholarly articles that are impossible for a human being to keep track of and explore without appropriate tool support. In this context, we created the Covid-on-the-Web project, which aims to assist the accessing, querying, and sense-making of COVID-19-related literature by combining efforts from the semantic web, natural language processing, and visualization fields. In particular, in this paper we present an RDF data set (a linked version of the “COVID-19 Open Research Dataset” (CORD-19), enriched via entity linking and argument mining) and the “Linked Data Visualizer” (LDViz), which assists the querying and visual exploration of the referred data set. The LDViz tool assists in the exploration of different views of the data by combining a querying management interface, which enables the definition of meaningful subsets of data through SPARQL queries, and a visualization interface based on a set of six visualization techniques integrated in a chained visualization concept, which also supports the tracking of provenance information. We demonstrate the potential of our approach to assist biomedical researchers in solving domain-related tasks, as well as to perform exploratory analyses through use case scenarios.

Highlights

The COVID-19 pandemics motivated the scientific community from numerous fields of research to contribute in a common effort to study, understand and fight the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
Future work includes the integration of this visualization interface in the LDViz tool, where the user could analyze and explore meaningful data defined via SPARQL queries, to what is done with the MGExplorer, resulting on a completely integrated tool for extracting and exploring knowledge from scientific literature through various perspectives
Based on the needs of biomedical researchers, partners of the project, we designed and published a linked data knowledge graph describing the named entities mentioned in the articles of the CORD-19 corpus and the argumentative graphs they include

Summary

Introduction

The COVID-19 pandemics motivated the scientific community from numerous fields of research to contribute in a common effort to study, understand and fight the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The Covid-on-the-Web RDF dataset includes and enriches over 100,000 full-text scholarly articles from the 47th version of the CORD-19 corpus, which corresponds to 1.3 billion RDF triples describing the articles’ metadata, an argumentation and a named entities (NE) knowledge graph. The second contribution correspond to LDViz , a visualization tool that enables the exploration of the COVID-19 scientific literature from different perspectives, such as co-authorship, named entities co-occurrence and the relationship between claims and evidences within publications. There have been previous contributions in exploring the CORD-19 corpus through entity linking approaches (e.g., Oniani et al, 2020; Reese et al, 2021), to the best of our knowledge, the Covid-on-the-Web dataset is the first to integrate NEs, arguments and PICO components into a single, coherent whole.

Related Work

Discussion

Conclusion