Human-Machine Information Extraction Simulator for Biological Collections

Icaro Alzuru,Fortes Jose A.B,Andrea Matsunaga,Mauricio Tsugawa,Aditi Malladi

doi:10.1109/bigdata47090.2019.9005601

Abstract

In the last decade, institutions from around the world have implemented initiatives for digitizing biological collections (biocollections) and sharing their information online. The transcription of the metadata from photographs of specimens’ labels is performed through human-centered approaches (e.g., crowdsourcing) because fully automated Information Extraction (IE) methods still generate a significant number of errors. The integration of human and machine tasks has been proposed to accelerate the IE from the billions of specimens waiting to be digitized. Nevertheless, in order to conduct research and trying new techniques, IE practitioners need to prepare sets of images, crowdsourcing experiments, recruit volunteers, process the transcriptions, generate ground truth values, program automated methods, etc. These research resources and processes require time and effort to be developed and architected into a functional system. In this paper, we present a simulator intended to accelerate the ability to experiment with workflows for extracting Darwin Core (DC) terms from images of specimens. The so-called HuMaIN Simulator includes the engine, the human-machine IE workflows for three DC terms, the code of the automated IE methods, crowdsourced and ground truth transcriptions of the DC terms of three biocollections, and several experiments that exemplify its potential use. The simulator adds Human-in-the-loop capabilities, for iterative IE and research on optimal methods. Its practical design permits the quick definition, customization, and implementation of experimental IE scenarios.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Human-Machine Information Extraction Simulator for Biological Collections

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Improving the Adoption and Evolution of Data Standards for Fossil Specimens
Holly Little ... Erica Krimmel
Biodiversity Information Science and Standards | VOL. 5
Holly Little, et. al.Holly Little ... Erica Krimmel
23 Sep 2021
Biodiversity Information Science and Standards | VOL. 5

Adaptation of Darwin Core Standards and Development of New Standards for Geologic Specimens
Christina Byrd
Biodiversity Information Science and Standards | VOL. 2
Christina ByrdChristina Byrd
15 Jun 2018
Biodiversity Information Science and Standards | VOL. 2

An Excel Template Generator for Darwin Core
Luke Marsden ... Olaf Schneider
Biodiversity Information Science and Standards | VOL. 7
Luke Marsden, et. al.Luke Marsden ... Olaf Schneider
31 Aug 2023
Biodiversity Information Science and Standards | VOL. 7

General Self-aware Information Extraction from Labels of Biological Collections
Icaro Alzuru ... Mauricio Tsugawa
-
Icaro Alzuru, et. al.Icaro Alzuru ... Mauricio Tsugawa
10 Dec 2020
10 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Human-Machine Information Extraction Simulator for Biological Collections

Abstract

Talk to us

Similar Papers