The DoeDat platform was launched by Meise Botanic Garden in 2018 to capture label data from imaged herbarium specimens by inviting volunteer contributors (Groom et al. 2018). It has since facilitated data capture from specimens of other natural history collections (Helminger et al. 2020, Mitrache et al. 2023), as well as digitised content from various other disciplines, such as historical photographs, posters and postcards. Volunteers may simply transcribe handwritten and/or typed text, but often also interpret the sparse and scattered information on the image, including trying to georeference its original location. As of April 2024, almost 650.000 tasks have been completed, of which more than 470.000 were herbarium specimens from Meise. DoeDat supports domain standards, including Darwin Core, and follows most of the currently drafted MIDS (Minimum Information about a Digital Specimen) guidelines as to what data is captured for natural history specimens. However, images have to be pre-loaded into the server storage for each project and captured data gets exported as one or more CSV files per project. These data files then still need to be processed before they can be ingested into the local management system (Engledow et al. 2023). Often the data are also subjected to additional quality control before they get openly published. This can result in the pipeline from image to openly published annotations being quite time and labour-consuming. As the biodiversity infrastructure landscape moves more towards FAIR (Findable, Accessible, Interoperable, Reusable) open data, DoeDat will adapt accordingly. This includes digital objects that are easy to annotate. Furthermore, image servers following IIIF (International Image Interoperability Framework) greatly standardise the access and portability of media content, drastically changing the way images are being dealt with. We envision upgrading the DoeDat platform to load images and any required metadata as IIIF manifests, greatly streamlining the process of adding new content and tracking provenance. The transcriptions should be accessible for external systems, loading the updated image manifests and publishing them as annotations such as nanopublications.
Read full abstract