Human and Machine Working Together towards High Quality Specimen Data: Annotation and Curation of the Digital Specimen

Sam Leeflang,Wouter Addink,Soulaine Theocharides

doi:10.3897/biss.6.90987

Sam Leeflang, Wouter Addink + Show 1 more

Open Access

https://doi.org/10.3897/biss.6.90987

Copy DOI

Journal: Biodiversity Information Science and Standards	Publication Date: Aug 1, 2022
Citations: 2	License type: CC BY 4.0

Abstract

The engine for our Distributed System of Scientific Collections (DiSSCo) is running! Core technical components supporting this new research infrastructure are currently being implemented and the engine that will support it is already working. Even though some nuts and bolts may still be missing, we aim to show it in action to present how it will enable annotation and curation of the Digital Specimen. The Digital Specimen is a technical implementation based on FAIR Digital Objects (FAIR stands for Findable, Accessible, Interoperable and Reusable) to support the Digital Extended Specimen concept (Webster et al. 2021). We will also present and demonstrate how we will implement standardized quality checks as they are being developed in Biodiversity Information Standards (TDWG) to enhance the quality of the data. DiSSCo is currently in its preparation phase. This phase will end in January 2023 with the completion of the DiSSCo Prepare project funded by the European Commission. Part of that project is the design of the Digital Specimen infrastructure, which is not an easy task considering the wide range of use cases, stakeholders and the many possibilities it offers. However, as we are moving towards the end of the project, we have defined clear goals and priorities to give shape to that infrastructure. This is where we take a fail fast approach: to quickly implement the proposed solution and see if it really fits. One of the major needs we want to support with the Digital Specimen infrastructure (based on collected user stories (Fitzgerald et al. 2021)) is to provide services for improving the quality and usability of specimen data. Our infrastructure aims to support annotating and community-curation of the data by both machines and users. Examples of these annotations are image-based determinations, automated- or citizen science-contributed label translations or the semi-automated linking with other biodiversity data. Semi-automated linking is currently being piloted as part of the Biodiversity Community Integrated Knowledge Library project (BiCIKL) and will use a process of link prediction through artificial intelligence in combination with human validation. Improvements in data quality made together by human and machine through the curation and annotation services will help in producing a digital specimen data object with high quality, curated and extended data. As part of the presentation we aim to give a live demonstration with the first setup in which we will ingest a dataset, run standardized quality checks and automated data enrichment services. The end result will be a digital specimen that we will present in a user-friendly interface, which has been validated by quality checks and annotated by both a human and a machine. The result will also be accessible as a FAIR Digital Object through an API. During the demonstration, we aim to give the audience a clear view on how DiSSCo can help them create higher quality specimen data, and how we will benefit in this process from the outputs of the TDWG Data quality tests and assertions taskgroup.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Human and Machine Working Together towards High Quality Specimen Data: Annotation and Curation of the Digital Specimen

Abstract

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards

Lead the way for us

Similar Papers

DiSSCo e-Services to Serve Global Community Needs
Wouter Addink ... Jose Alonso
Biodiversity Information Science and Standards | VOL. 5
Wouter Addink, et. al.Wouter Addink ... Jose Alonso
03 Sep 2021
Biodiversity Information Science and Standards | VOL. 5

Conceptual design blueprint for the DiSSCo digitization infrastructure - DELIVERABLE D8.1
Alex Hardisty ... Mathias Dillen
Research Ideas and Outcomes | VOL. 6
Alex Hardisty, et. al.Alex Hardisty ... Mathias Dillen
18 May 2020
Research Ideas and Outcomes | VOL. 6

DiSSCover the Potential of FAIR Digital Object Annotations and How You Can Use Them!
Tom Dijkema ... Sam Leeflang
Biodiversity Information Science and Standards | VOL. 8
Tom Dijkema, et. al.Tom Dijkema ... Sam Leeflang
07 Aug 2024
Biodiversity Information Science and Standards | VOL. 8

Harmonised Data is Actionable Data: DiSSCo’s solution to data mapping
Sam Leeflang ... Wouter Addink
Biodiversity Information Science and Standards | VOL. 7
Sam Leeflang, et. al.Sam Leeflang ... Wouter Addink
06 Sep 2023
Biodiversity Information Science and Standards | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Human and Machine Working Together towards High Quality Specimen Data: Annotation and Curation of the Digital Specimen

Abstract

Talk to us

Similar Papers

More From: Biodiversity Information Science and Standards