Abstract

The New York Botanical Garden Herbarium has been databasing and imaging its estimated 7.3 million plant specimens for the past 17 years. Due to the size of the collection, we have been selectively digitizing fundable subsets of specimens, making successive passes through the herbarium with each new grant. With this strategy, the average rate for databasing complete records has been 10 specimens per hour. With 1.3 million specimens databased, this effort has taken about 130,000 hours of staff time. At this rate, to complete the herbarium and digitize the remaining 6 million specimens, another 600,000 hours would be needed. Given the current biodiversity and economic crises, there is neither the time nor money to complete the collection at this rate.Through a combination of grants over the last few years, The New York Botanical Garden has been testing new protocols and tactics for increasing the rate of digitization through combinations of data collaboration, field book digitization, partial data entry and imaging, and optical character recognition (OCR) of specimen images. With the launch of the National Science Foundation’s new Advancing Digitization of Biological Collections program, we hope to move forward with larger, more efficient digitization projects, capturing data from larger portions of the herbarium at a fraction of the cost and time.

Highlights

  • The specimens in the world’s museums and herbaria contain a wealth of primary occurrence data that is used as the basis of many biodiversity research studies (Chapman 2005; Baird 2010; Pyke and Ehrlich 2010)

  • While millions of specimen records are available through institutional portals and distributed networks such as GBIF, these only represent a small fraction of the estimated 90 million herbarium specimens in the United States alone that still need to be digitized (Rabeler and Macklin 2006)

  • Past digitization projects at New York Botanical Garden Herbarium (NYBG) have focused on manageable and fundable subsets of the collection ranging from 75,000–100,000 specimens that could be completed within two to three years (Vollmar et al 2010)

Read more

Summary

Introduction

The specimens in the world’s museums and herbaria contain a wealth of primary occurrence data that is used as the basis of many biodiversity research studies (Chapman 2005; Baird 2010; Pyke and Ehrlich 2010). The specimens were curated, separated and removed from the herbarium for data entry from the specimen labels With this pre-load of data from the field books and imports, the only information to add was the taxon and plant description, and the completion of fully catalogued records increased to 30 records per hour. At this stage the records were made available online. The New York Botanical Garden Herbarium uses ABBYY FineReader optical character recognition software to produce text files from specimen labels. Viewing the grayscale images reduces the time required to open large files, allowing the cataloger to quickly verify the OCR text which is manually parsed into the correct database fields. This step allows other fields in the database to be filled when the label text is imported

Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call