OpenBiodiv: A Knowledge Graph for Literature-Extracted Linked Open Data in Biodiversity Science

Lyubomir Penev,Mariya Dimitrova,Kiril Simov,Viktor Senderov,Teodor Georgiev,Georgi Zhelezov,Pavel Stoev

doi:10.3390/publications7020038

Abstract

Hundreds of years of biodiversity research have resulted in the accumulation of a substantial pool of communal knowledge; however, most of it is stored in silos isolated from each other, such as published articles or monographs. The need for a system to store and manage collective biodiversity knowledge in a community-agreed and interoperable open format has evolved into the concept of the Open Biodiversity Knowledge Management System (OBKMS). This paper presents OpenBiodiv: An OBKMS that utilizes semantic publishing workflows, text and data mining, common standards, ontology modelling and graph database technologies to establish a robust infrastructure for managing biodiversity knowledge. It is presented as a Linked Open Dataset generated from scientific literature. OpenBiodiv encompasses data extracted from more than 5000 scholarly articles published by Pensoft and many more taxonomic treatments extracted by Plazi from journals of other publishers. The data from both sources are converted to Resource Description Framework (RDF) and integrated in a graph database using the OpenBiodiv-O ontology and an RDF version of the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Through the application of semantic technologies, the project showcases the value of open publishing of Findable, Accessible, Interoperable, Reusable (FAIR) data towards the establishment of open science practices in the biodiversity domain.

Highlights

The present paper describes the rationale, concept, infrastructure and underlying data of OpenBiodiv—the first linked open data (LOD)-based Open Biodiversity Knowledge Management System (OBKMS), which integrates knowledge extracted from biodiversity publications and a taxonomic backbone tree used by Global Biodiversity Information Facility (GBIF)
Scientific names used within this large group of texts were mapped to GBIF’s taxonomic backbone, which has been converted into Resource Description Framework (RDF) and integrated in OpenBiodiv
The realization of OpenBiodiv as an Open Biodiversity Knowledge Management System was done through the creation of a semantic database, which contains a Linked Open Dataset based on the ontology OpenBiodiv-O [28], a codebase for automatic transformation of literature into RDF statements, a website [43] providing a frontend to the database and a SPARQL endpoint [42] (Figure 1)

Summary

Introduction

Biodiversity science studies and describes the diversity of living organisms on Earth. It is an interdisciplinary field that encompasses knowledge from multiple domains: Taxonomy, genomics, biogeography, ecology, phylogenetics and others. Developing mechanisms for storage and management of such diverse and rich information is of particular importance for biology and several other areas of research and practical activities [1,2,3]. Attempts to create a system and standards for integrating biodiversity knowledge have existed since the establishment of the Biodiversity Informatics Standards organization in 1985, called. The Introduction of an article Identifier (DOI). The DOI ofsection an article Author Name. The name of section an article of author an article Treatment.

Methods

Results

Discussion

Conclusion