Abstract
This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens.
Highlights
A key limiting factor in organising and using information from global natural history specimens is making that information structured and computable
The tools evaluated in this landscape analysis include both unsupervised and supervised machine learning approaches, with a key difference being that unsupervised methods do not require a training dataset
As the Specimen Data Refinery is intended to integrate both artificial intelligence (AI) and human-in-the-loop (HitL) approaches to extraction and annotation, citizen science platforms such as plant identification apps and volunteer transcription services were included in the initial research
Summary
A key limiting factor in organising and using information from global natural history specimens is making that information structured and computable. The objective of the Specimen Data Refinery (SDR) is to combine these technologies into a cloud-based platform for processing specimen images and their labels en masse in order to extract essential data efficiently and effectively, according to standard best practices As part of this process a workflow was developed, illustrating the steps required to fully automate the procedure from image capture to a full specimen dataset (Fig. 1). This report does not include: technical evaluation of existing tools, service registries and platform-based approaches; evaluation and recommendations on using, integrating and merging partial (prior/previously created) specimen data; assessment of hardware and physical infrastructure requirements; assessment for the potential to use pan-European Collaborative Data Infrastructure; creation of reference/ground truth/training datasets
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.