Abstract

BackgroundGlobal Plants, a collaborative between JSTOR and some 300 herbaria, now contains about 2.48 million high-resolution images of plant specimens, a number that continues to grow, and collections that are digitizing their specimens at high resolution are allocating considerable recourses to the maintenance of computer hardware (e.g., servers) and to acquiring digital storage space. We here apply machine learning, specifically the training of a Support-Vector-Machine, to classify specimen images into categories, ideally at the species level, using the 26 most common tree species in Germany as a test case.ResultsWe designed an analysis pipeline and classification system consisting of segmentation, normalization, feature extraction, and classification steps and evaluated the system in two test sets, one with 26 species, the other with 17, in each case using 10 images per species of plants collected between 1820 and 1995, which simulates the empirical situation that most named species are represented in herbaria and databases, such as JSTOR, by few specimens. We achieved 73.21% accuracy of species assignments in the larger test set, and 84.88% in the smaller test set.ConclusionsThe results of this first application of a computer vision algorithm trained on images of herbarium specimens shows that despite the problem of overlapping leaves, leaf-architectural features can be used to categorize specimens to species with good accuracy. Computer vision is poised to play a significant role in future rapid identification at least for frequently collected genera or species in the European flora.Electronic supplementary materialThe online version of this article (doi:10.1186/s12862-016-0827-5) contains supplementary material, which is available to authorized users.

Highlights

  • Global Plants, a collaborative between JSTOR and some 300 herbaria, contains about 2.48 million high-resolution images of plant specimens, a number that continues to grow, and collections that are digitizing their specimens at high resolution are allocating considerable recourses to the maintenance of computer hardware and to acquiring digital storage space

  • Computer vision has not been applied to JSTOR plant specimen images

  • Because of the great number of databased images available and the comparatively few taxonomic experts, there is a great need for computer vision to be applied to specimen images of which millions are being made available online at substantial costs ([16]; cf. our Introduction with data on JSTOR images and usage numbers)

Read more

Summary

Introduction

Global Plants, a collaborative between JSTOR and some 300 herbaria, contains about 2.48 million high-resolution images of plant specimens, a number that continues to grow, and collections that are digitizing their specimens at high resolution are allocating considerable recourses to the maintenance of computer hardware (e.g., servers) and to acquiring digital storage space. Global Plants, a collaborative between JSTOR and some 300 herbaria, is the world’s largest database of digitized plant specimens (http://about.jstor.org/content/ global-plants, accessed 20 April 2016) It currently contains images of 2,482,901 million herbarium sheets (Hannah Begley, Digital Librarian for Primary Sources, JSTOR, 4 May 2016). During a slightly larger period of 1.5 years (Dec. 2014-April 2016), there were 419,822 unique visitors (Hannah Begley, Digital Librarian for Primary Sources, JSTOR, 4 May 2016). At this time, only people who know the Latin name of a plant (or its synonymous names) can find images of specific species in JSTOR

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call