The OpenBiodiv Knowledge Graph Rebuilt: A semantic hub on top of the ARPHA-published content and the Biodiversity Literature Repository

Lyubomir Penev,Georgi Zhelezov,Teodor Georgiev,Mariya Dimitrova

doi:10.3897/biss.6.91357

Abstract

OpenBiodiv is a complex ecosystem of tools and services for RDF conversion of XML narratives of biodiversity articles including Darwin Core data into Linked Open Data (LOD), running on top of a graph database. OpenBiodiv provides four main types of services: Searching named entities (e.g., taxon names, taxon concepts, treatments, specimens, occurrences, gene sequences, bibliographic information, institutions, persons) in context, within and between articles. Answering questions based on the presence of certain named entities within specific article sections (e.g., titles, abstracts, introduction or other sections, taxon treatments). Identifying article sections for further text processing (NLP) and providing contextual information, stored in MongoDB. Federating the SPARQL endpoint with other triple stores to enrich the discovered knowledge. Searching named entities (e.g., taxon names, taxon concepts, treatments, specimens, occurrences, gene sequences, bibliographic information, institutions, persons) in context, within and between articles. Answering questions based on the presence of certain named entities within specific article sections (e.g., titles, abstracts, introduction or other sections, taxon treatments). Identifying article sections for further text processing (NLP) and providing contextual information, stored in MongoDB. Federating the SPARQL endpoint with other triple stores to enrich the discovered knowledge. Conversion of such data into RDF follows a general semantic model expressed in the OpenBiodiv-O ontology, an extension of the Treatment Ontology for knowledge representation of current and legacy biodiversity publications (Senderov et al. 2018) and uses two main sources, the full-text article XML published on the ARPHA Publishing Platform and the taxon treatments extracted by Plazi’s TreatmentBank from more than 100 biodiversity journals, stored in the Biodiversity Literature Repository at Zenodo. To ensure efficiency, quality control and fast tracking of all stages of the entire process of extraction, conversion to RDF and indexing of the content has been re-built on the Apache Kafka event streaming platform (Fig. 1). In this new format, OpenBiodiv provides not only a GraphDB SPARQL query endpoint but also indexes the named entities through Elasticsearch and additional provision of data to end users through a RESTful API and a number of user applications. OpenBiodiv is designed for a wide range of users who are interested in a deep-level bibliographic exploration, an ontology-linked search of various data elements (e.g., specimens, sequences, taxon concepts, persons), or co-existence of named entities (e.g., taxon names with a possible biotic relationships between them, or taxon names and potential habitats of occupation) in pre-defined sections of the articles. The SPARQL endpoint allows complex queries of various kinds (Dimitrova et al. 2021).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Biodiversity Information Science and Standards	Publication Date: Aug 23, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The OpenBiodiv Knowledge Graph Rebuilt: A semantic hub on top of the ARPHA-published content and the Biodiversity Literature Repository

Abstract

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards

Lead the way for us

Similar Papers

OpenBiodiv for Users: Applications and Approaches to Explore a Biodiversity Knowledge Graph
Lyubomir Penev ... Iva Boyadzhieva
Biodiversity Information Science and Standards | VOL. 7
Lyubomir Penev, et. al.Lyubomir Penev ... Iva Boyadzhieva
09 Aug 2023
Biodiversity Information Science and Standards | VOL. 7

Nanopublications for Biodiversity Go Live
Lyubomir Penev ... Iva Boyadzhieva
Biodiversity Information Science and Standards | VOL. 7
Lyubomir Penev, et. al.Lyubomir Penev ... Iva Boyadzhieva
09 Aug 2023
Biodiversity Information Science and Standards | VOL. 7

Improved Sharing and Linkage of Taxonomic Data with the Taxon Concept Standard (TCS)
Niels Klazenga
Biodiversity Information Science and Standards | VOL. 7
Niels KlazengaNiels Klazenga
05 Sep 2023
Biodiversity Information Science and Standards | VOL. 7

WF.ACTIAS: A workflow for a better integration of biodiversity data from diverse sources
Liliana Ballesteros Mejia ... Sujeevan Ratnasingham
Biodiversity Information Science and Standards | VOL. 3
Liliana Ballesteros Mejia, et. al.Liliana Ballesteros Mejia ... Sujeevan Ratnasingham
18 Jun 2019
Biodiversity Information Science and Standards | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The OpenBiodiv Knowledge Graph Rebuilt: A semantic hub on top of the ARPHA-published content and the Biodiversity Literature Repository

Abstract

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards