Assisting Biologists in Editing Taxonomic Information by Confronting Multiple Data Sources using Linked Data Standards

Franck Michel,Gargominy Olivier,Catherine Faron-Zucker,Sandrine Tercerie,Antonia Ettorre

doi:10.3897/biss.3.37421

Abstract

During the last decade, Web APIs (Application Programming Interface) have gained significant traction to the extent that they have become a de-facto standard to enable HTTP-based, machine-processable data access. Despite this success, however, they still often fail in making data interoperable, insofar as they commonly rely on proprietary data models and vocabularies that lack formal semantic descriptions essential to ensure reliable data integration. In the biodiversity domain, multiple data aggregators, such as the Global Biodiversity Information Facility (GBIF) and the Encyclopedia of Life (EoL), maintain specialized Web APIs giving access to billions of records about taxonomies, occurrences, or life traits (Triebel et al. 2012). They publish data sets spanning complementary and often overlapping regions, epochs or domains, but may also report or rely on potentially conflicting perspectives, e.g. with respect to the circumscription of taxonomic concepts. It is therefore of utmost importance for biologists and collection curators to be able to confront the knowledge they have about taxa with related data coming from third-party data sources. To tackle this issue, the French National Museum of Natural History (MNHN) has developed an application to edit TAXREF, the French taxonomic register for fauna, flora and fungus (Gargominy et al. 2018). TAXREF registers all species recorded in metropolitan France and overseas territories, accounting for 260,000+ biological taxa (200,000+ species) along with 570,000+ scientific names. The TAXREF-Web application compares data available in TAXREF with corresponding data from third-party data sources, points out disagreements and allows biologists to add, remove or amend TAXREF accordingly. This requires that TAXREF-Web developers write a specific piece of code for each considered Web API to align TAXREF representation with the Web API counterpart. This task is time-consuming and makes maintenance of the web application cumbersome. In this presentation, we report on a new implementation of TAXREF-Web that harnesses the Linked Data standards: Resource Description Framework (RDF), the Semantic Web format to represent knowledge graphs, and SPARQL, the W3C standard to query RDF graphs. In addition, we leverage the SPARQL Micro-Service architecture (Michel et al. 2018), a lightweight approach to query Web APIs using SPARQL. A SPARQL micro-service is a SPARQL endpoint that wraps a Web API service; it typically produces a small, resource-centric RDF graph by invoking the Web API and transforming the response into RDF triples. We developed SPARQL micro-services to wrap the Web APIs of GBIF, World Register of Marine Species (WoRMS), FishBase, Index Fungorum, Pan-European Species directories Infrastructure (PESI), ZooBank, International Plant Names Index (IPNI), EoL, Tropicos and Sandre. These micro-services consistently translate Web APIs responses into RDF graphs utilizing mainly two well-adopted vocabularies: Schema.org (Guha et al. 2015) and Darwin Core (Baskauf et al. 2015). This approach brings about two major advantages. First, the large adoption of Schema.org and Darwin Core ensures that the services can be immediately understood and reused by a large audience within the biodiversity community. Second, wrapping all these Web APIs in SPARQL micro-services “suddenly” makes them technically and semantically interoperable, since they all represent resources (taxa, habitats, traits, etc.) in a common manner. Consequently, the integration task is simplified: confronting data from multiple sources essentially consists of writing the appropriate SPARQL queries, thus making easier web application development and maintenance. We present several concrete cases in which we use this approach to detect disagreements between TAXREF and the aforementioned data sources, with respect to taxonomic information (author, synonymy, vernacular names, classification, taxonomic rank), habitats, bibliographic references, species interactions and life traits.

Highlights

To cite this version: Franck Michel, Sandrine Tercerie, Antonia Ettorre, Olivier Gargominy, Catherine Faron Zucker
Multiple data aggregators, such as the Global Biodiversity Information Facility (GBIF) and the Encyclopedia of Life (EoL), maintain specialized Web APIs giving access to billions of records about taxonomies, occurrences, or life traits (Triebel et al 2012). They publish data sets spanning complementary and often overlapping regions, epochs or domains, but may report or rely on potentially conflicting perspectives, e.g. with respect to the circumscription of taxonomic concepts. It is of utmost importance for biologists and collection curators to be able to confront the knowledge they have about taxa with related data coming from third-party data sources
We report on a new implementation of TAXREF-Web that harnesses the Linked Data standards: Resource Description Framework (RDF), the Semantic Web format to represent knowledge graphs, and SPARQL, the W3C standard to query RDF graphs

Summary

Introduction

To cite this version: Franck Michel, Sandrine Tercerie, Antonia Ettorre, Olivier Gargominy, Catherine Faron Zucker. Corresponding author: Franck Michel (franck.michel@cnrs.fr) Received: 17 Jun 2019 | Published: 26 Jun 2019 Citation: Michel F, Faron-Zucker C, Tercerie S, Ettorre A, Olivier G (2019) Assisting Biologists in Editing Taxonomic Information by Confronting Multiple Data Sources using Linked Data Standards. Web APIs (Application Programming Interface) have gained significant traction to the extent that they have become a de-facto standard to enable HTTP-based, machine-processable data access.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Assisting Biologists in Editing Taxonomic Information by Confronting Multiple Data Sources using Linked Data Standards

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards

Lead the way for us

Journal: Biodiversity Information Science and Standards	Publication Date: Jun 26, 2019
License type: CC BY 4.0

Similar Papers

Integration of Biodiversity Linked Data and Web APIs using SPARQL Micro-Services
Franck Michel ... Fabien Gandon
Biodiversity Information Science and Standards | VOL. 2
Franck Michel, et. al.Franck Michel ... Fabien Gandon
22 May 2018
Biodiversity Information Science and Standards | VOL. 2

APIs: A Common Interface for the Global Biodiversity Informatics Community
Ben Norton
Biodiversity Information Science and Standards | VOL. 5
Ben NortonBen Norton
16 Sep 2021
Biodiversity Information Science and Standards | VOL. 5

Automated Web Service Specification Generation Through a Transformation-Based Learning
Mehdi Bahrami ... Wei-Peng Chen
-
Mehdi Bahrami, et. al.Mehdi Bahrami ... Wei-Peng Chen
01 Jan 2020
01 Jan 2020

Metamorphic testing of RESTful web APIs
Sergio Segura ... Javier Troya
-
Sergio Segura, et. al.Sergio Segura ... Javier Troya
27 May 2018
27 May 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Assisting Biologists in Editing Taxonomic Information by Confronting Multiple Data Sources using Linked Data Standards

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards