Abstract

On herbarium sheets, data elements such as plant name, collection site, collector, barcode and accession number are found mostly on labels glued to the sheet. The data are thus visible on specimen images. With continuously improving technologies for collection mass-digitisation it has become easier and easier to produce high quality images of herbarium sheets and in the last few years herbarium collections worldwide have started to digitize specimens on an industrial scale (Tegelberg et al. 2014). To use the label data contained in these massive numbers of images, they have to be captured and databased. Currently, manual data entry prevails and forms the principal cost and time limitation in the digitization process. The StanDAP-Herb Project has developed a standard process for (semi-) automatic detection of data on herbarium sheets. This is a formal extensible workflow integrating a wide range of automated specimen image analysis services, used to replace time-consuming manual data input as far as possible. We have created web-services for OCR (Optical Character Recognition); for identifying regions of interest in specimen images and for the context-sensitive extraction of information from text recognized by OCR. We implemented the workflow as an extension of the OpenRefine platform (Verborgh and De Wilde 2013).

Highlights

  • Data elements such as plant name, collection site, collector, barcode and accession number are found mostly on labels glued to the sheet

  • The StanDAP-Herb Project has developed a standard process for automatic detection of data on herbarium sheets. This is a formal extensible workflow integrating a wide range of automated specimen image analysis services, used to replace time-consuming manual data input as far as possible

  • We have created web-services for OCR (Optical Character Recognition); for identifying regions of interest in specimen images and for the context-sensitive extraction of information from text recognized by OCR

Read more

Summary

Introduction

Data elements such as plant name, collection site, collector, barcode and accession number are found mostly on labels glued to the sheet. The data are visible on specimen images.

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.