PeerJ Computer Science | VOL. 4
Read

20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration

Publication Date Sep 17, 2018

Abstract

Biodiversity information is made available through numerous databases that each have their own data models, web services, and data types. Combining data across databases leads to new insights, but is not easy because each database uses its own system of identifiers. In the absence of stable and interoperable identifiers, databases are often linked using taxonomic names. This labor intensive, error prone, and lengthy process relies on accessible versions of nomenclatural authorities and fuzzy-matching algorithms. To approach the challenge of linking diverse data, more than technology is needed. New social collaborations like the Global Unified Open Data Architecture (GUODA) that combines skills from diverse groups of computer engineers from iDigBio, server resources from the Advanced Computing and Information Systems (ACIS) Lab, global-scale data presentation from EOL, and independent developers and researchers are what is needed to make concrete progress on finding relationships between biodiversity datasets. This paper will discuss a technical solution developed by the GUODA collaboration for faster linking across databases with a use case linking Wikidata and the Global Biotic Interactions database (GloBI). The GUODA infrastructure is a 12-node, high performance computing cluster made up of about 192 threads with 12 TB of storage and 288 GB memory. Using GUODA, 20 GB of compressed JSON from Wikidata was processed and linked to GloBI in about 10–11 min. Instead of comparing name strings or relying on a single identifier, Wi...

Concepts
Powered ByUnsilo

Fuzzy-matching Algorithms
Cross-institutional Collaboration
Biodiversity Datasets
Consistency Metrics
GB Memory
Social Collaborations
Numerous Databases
Taxonomic Names
Global Open Data
Advanced Computing

Introducing Weekly Round-ups!Beta

Powered by R DiscoveryR Discovery

Round-ups are the summaries of handpicked papers around trending topics published every week. These would enable you to scan through a collection of papers and decide if the paper is relevant to you before actually investing time into reading it.

Climate change Research Articles published between Jun 13, 2022 to Jun 19, 2022

R DiscoveryJun 20, 2022
R DiscoveryArticles Included:  5

Understanding the effects of the snow ratio on glacier mass balance under variable climatic conditions is crucial for predicting how glaciers will res...

Read More

Good health Research Articles published between Jun 13, 2022 to Jun 19, 2022

R DiscoveryJun 20, 2022
R DiscoveryArticles Included:  5

In West Africa, the various types of diabetes according to WHO or the American Diabetes Association (ADA) are endemic; a particular type referred to a...

Read More

Gender Equality Research Articles published between Jun 13, 2022 to Jun 19, 2022

R DiscoveryJun 20, 2022
R DiscoveryArticles Included:  5

This study aims to evaluate changes in the presence of women on Spanish boards after the Unified Good Governance Code of Listed Companies (2006) and t...

Read More

Coronavirus Pandemic

You can also read COVID related content on R COVID-19

R ProductsCOVID-19

ONE PROBLEM . ONE PURPOSE . ONE PLACE

Creating the world’s largest AI-driven & human-curated collection of research, news, expert recommendations and educational resources on COVID-19

COVID-19 Dashboard