Landscape Analysis for the Specimen Data Refinery

Stephanie Walton,Alan Williams,Quentin Groom,Zhengzhe Wu,Robert Cubey,Ben Scott,Carole Goble,Christopher Kermorvant,Laurence Livermore,Olaf Bánki,Isabel Rey,Robyn Drinkwater,Markus Englund,Celia Santos

doi:10.3897/rio.6.e57602

Abstract

This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens.

Highlights

A key limiting factor in organising and using information from global natural history specimens is making that information structured and computable
The tools evaluated in this landscape analysis include both unsupervised and supervised machine learning approaches, with a key difference being that unsupervised methods do not require a training dataset
As the Specimen Data Refinery is intended to integrate both artificial intelligence (AI) and human-in-the-loop (HitL) approaches to extraction and annotation, citizen science platforms such as plant identification apps and volunteer transcription services were included in the initial research

Summary

Introduction

A key limiting factor in organising and using information from global natural history specimens is making that information structured and computable. The objective of the Specimen Data Refinery (SDR) is to combine these technologies into a cloud-based platform for processing specimen images and their labels en masse in order to extract essential data efficiently and effectively, according to standard best practices As part of this process a workflow was developed, illustrating the steps required to fully automate the procedure from image capture to a full specimen dataset (Fig. 1). This report does not include: technical evaluation of existing tools, service registries and platform-based approaches; evaluation and recommendations on using, integrating and merging partial (prior/previously created) specimen data; assessment of hardware and physical infrastructure requirements; assessment for the potential to use pan-European Collaborative Data Infrastructure; creation of reference/ground truth/training datasets

Machine Learning and Training Data Sets

Prior Research on Automation

Crowdsourcing and Human-in-the-Loop

Project Context

Methodology

Gap Analysis

Image segmentation

Building a Workflow

Selecting a Human-in-the-Loop Workflow Management Systems

Implementing a standardised workflow language for interoperability

Incorporating prior information and the statistical framework

Assembling the workflow

The Specimen Data Refinery techology stack

Conclusion

Findings

Funding program

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Research Ideas and Outcomes	Publication Date: Aug 14, 2020
Citations: 17	License type: CC BY 4.0

R Discovery Prime

Landscape Analysis for the Specimen Data Refinery

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Research Ideas and Outcomes

Lead the way for us

Similar Papers

Development of an Automated Label Data Entry System from Herbarium Specimen Images at Hyogo Herbarium (HYO)
Atsuko Takano ... Chung-Kun Lee
Biodiversity Information Science and Standards | VOL. 8
Atsuko Takano, et. al.Atsuko Takano ... Chung-Kun Lee
30 Sep 2024
Biodiversity Information Science and Standards | VOL. 8

Bringing collections out of the dark.
Vladimir Blagoderov ... Vincent S Smith
ZooKeys | VOL. 209
Vladimir Blagoderov, et. al.Vladimir Blagoderov ... Vincent S Smith
20 Jul 2012
ZooKeys | VOL. 209

International Image Interoperability Framework: A unified approach to sharing images of natural history specimens?
Roger Hyam
Biodiversity Information Science and Standards | VOL. 4
Roger HyamRoger Hyam
05 Oct 2020
Biodiversity Information Science and Standards | VOL. 4

Leveraging Multimodality for Biodiversity Data: Exploring joint representations of species descriptions and specimen images using CLIP
Maya Sahraoui ... Marc Pignal
Biodiversity Information Science and Standards | VOL. 7
Maya Sahraoui, et. al.Maya Sahraoui ... Marc Pignal
14 Sep 2023
Biodiversity Information Science and Standards | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Landscape Analysis for the Specimen Data Refinery

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Research Ideas and Outcomes