Strategies for Assembling the Biodiversity Knowledge Graph

Roderic Page

doi:10.3897/biss.4.59126

Abstract

This talk explores different strategies for assembling the “biodiversity knowledge graph” (Page 2016). The first is a centralised, crowd-sourced approach using Wikidata as the foundation. Wikidata is becoming increasingly attractive as a knowledge graph for the life sciences (Waagmeester et al. 2020), and I will discuss some of its strengths and limitations, particularly as a source of bibliographic and taxonomic information. For example, Wikidata’s handling of taxonomy is somewhat problematic given the lack of clear separation of taxa and their names. A second approach is to build biodiversity knowledge graphs from scratch, such as OpenBioDiv (Penev et al. 2019) and my own Ozymandias (Page 2019). These approaches use either generalised vocabularies such as schema.org, or domain specific ones such as TaxPub (Catapano 2010) and the Semantic Publishing and Referencing Ontologies (SPAR) (Peroni and Shotton 2018), and to date tend to have restricted focus, whether geographic (e.g., Australian animals in Ozymandias) or temporal (recent taxonomic literature, OpenBioDiv). A growing number of data sources are now using schema.org to describe their data, including ORCID and Zenodo, and efforts to extend schema.org into biology (Bioschemas) suggest we may soon be able to build comprehensive knowledge graphs using just schema.org and its derivatives. A third approach is not to build an entire knowledge graph, but instead focus on constructing small pieces of the graph tightly linked to supporting evidence, for example via annotations. Annotations are increasingly used to mark up both the biomedical literature (e.g., Kim et al. 2015, Venkatesan et al. 2017) and the biodiversity literature (Batista-Navarro et al. 2017). One could argue that taxonomic databases are essentially lists of annotations (“this name appears in this publication on this page”), which suggests we could link literature projects such as the Biodiversity Heritage Library (BHL) to taxonomic databases via annotations. Given that the International Image Interoperability Framework (IIIF) provides a framework for treating publications themselves as a set of annotations (e.g., page images) upon which other annotations can be added (Zundert 2018), this suggests ways that knowledge graphs could lead directly to visualising the links between taxonomy and the taxonomic literature. All three approaches will be discussed, accompanied by working examples.

Highlights

A third approach is not to build an entire knowledge graph, but instead focus on constructing small pieces of the graph tightly linked to supporting evidence, for example via annotations
One could argue that taxonomic databases are essentially lists of annotations (“this name appears in this publication on this page”), which suggests we could link literature projects such as the Biodiversity Heritage Library (BHL) to taxonomic databases via annotations
Biodiversity Heritage Library, OpenBioDiv, Ozymandias, schema.org, annotations, biodiversity literature, taxonomic literature, Wikidata

Summary

Introduction

A third approach is not to build an entire knowledge graph, but instead focus on constructing small pieces of the graph tightly linked to supporting evidence, for example via annotations.

Results

Conclusion