Automated Herbarium Specimen Identification using Deep Learning

Jose Carranza-Rojas,Hervé Goëau,Alexis Joly,Pierre Bonnet,Erick Mata-Montero

doi:10.3897/tdwgproceedings.1.20302

Abstract

Hundreds of herbarium collections have accumulated a valuable heritage and knowledge of plants over several centuries (Page et al. 2015). Recent initiatives, such as iDigBio (https://www.idigbio.org), aggregate data from and images of vouchered herbarium sheets (and other biocollections) and make this information available to botanists and the general public worldwide through web portals. These ambitious plans to transform and preserve these historical biodiversity data into digital format are supported by the United States National Science Foundation (NSF) Advancing the Digitization of Natural History Collections (ADBC) and the digitization is done by the Thematic Collections Networks (TCNs) funded under the ADBC program. However, thousands of herbarium sheets are still unidentified at the species level while numerous sheets should be reviewed and updated following more recent taxonomic knowledge. These annotations and revisions require an unrealistic amount of work for botanists to carry out in a reasonable time (Bebber et al. 2010). Computer vision and machine learning approaches applied to herbarium sheets are promising (Wijesingha and Marikar 2012) but are still not well studied compared to automated species identification from leaf scans or pictures of plants taken in the field. In a recent study, we evaluate the accuracy with which herbarium images can be potentially exploited for species identification with deep learning technology (Carranza-Rojas et al. 2017), particularly Convolutional Neural Networks (CNN) (Szegedy et al. 2015). This type of network allows automatic learning of the most prominent visual patterns in the images since they are trainable end-to-end (thus, differentiable), as opposed to previous approaches that use custom, hand-made feature extractors. A first challenge is to use herbarium sheet images alone to automatically identify the species of plants mounted on herbarium sheets. Secondly, we propose studying if the combination of herbarium sheet images with photos of plants in the field (Joly et al. 2015, Carranza-Rojas and Mata-Montero 2016) is a viable idea to train models that provide accurate results during identification. Finally, we explore if herbarium images from one region with a specific flora can be used in transfer learning (a technique in deep learning that first allows training a model with a dataset and then once trained, uses the weighted results to train another model with that knowledge as the baseline) to another region with other species; for example, in a region under-represented in terms of collected data. Our evaluation shows that the accuracy for species identification with deep learning technology, based on herbarium images, reaches 90.3% on a dataset of more than 1200 European plant species. This could potentially lead to the creation of a semi-, or even fully automated system to help taxonomists and experts with their annotation, classification, and revision works. In this paper, we take a closer look at the accuracy levels achieved with respect to the first two challenges. We evaluate the accuracy levels for each species included in the dataset, which encompasses 253,733 images, 1,204 species.

Highlights

Hundreds of herbarium collections have accumulated a valuable heritage and knowledge of plants over several centuries (Page et al 2015). Recent initiatives, such as iDigBio, aggregate data from and images of vouchered herbarium sheets and make this information available to botanists and the general public worldwide through web portals. These ambitious plans to transform and preserve these historical biodiversity data into digital format are supported by the United States National Science Foundation (NSF) Advancing the Digitization of Natural History Collections (ADBC) and the digitization is done by the Thematic Collections Networks (TCNs) funded under the ADBC program
We evaluate the accuracy with which herbarium images can be potentially exploited for species identification with deep learning technology
We explore if herbarium images from one region with a specific flora can be used in transfer learning to another region with other species; for example, in a region under-represented in terms of collected data

Summary

Introduction

Hundreds of herbarium collections have accumulated a valuable heritage and knowledge of plants over several centuries (Page et al 2015). Corresponding author: Jose Carranza-Rojas (jcarranza@itcr.ac.cr), Alexis A.J. Joly (alexis.joly@inria.fr) Received: 14 Aug 2017 | Published: 16 Aug 2017 Citation: Carranza-Rojas J, Joly A, Bonnet P, Goëau H, Mata-Montero E (2017) Automated Herbarium Specimen Identification using Deep Learning. Recent initiatives, such as iDigBio (https:// www.idigbio.org), aggregate data from and images of vouchered herbarium sheets (and other biocollections) and make this information available to botanists and the general public worldwide through web portals.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of TDWG	Publication Date: Aug 16, 2017
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automated Herbarium Specimen Identification using Deep Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of TDWG

Lead the way for us

Similar Papers

Going deeper in the automated identification of Herbarium specimens
Jose Carranza-Rojas ... Pierre Bonnet
BMC Evolutionary Biology | VOL. 17
Jose Carranza-Rojas, et. al.Jose Carranza-Rojas ... Pierre Bonnet
11 Aug 2017
BMC Evolutionary Biology | VOL. 17

Plants meet machines: Prospects in machine learning for plant biology
Pamela S Soltis ... Alina Zare
Applications in Plant Sciences | VOL. 8
Pamela S Soltis, et. al.Pamela S Soltis ... Alina Zare
01 Jun 2020
Applications in Plant Sciences | VOL. 8

Application interface design of Chongqing intangible cultural heritage based on deep learning
Yanlong Liu ... Jie Li
Heliyon | VOL. 9
Yanlong Liu, et. al.Yanlong Liu ... Jie Li
01 Nov 2023
Heliyon | VOL. 9

Breast cancer detection from histopathological image dataset using hybrid convolution neural network
Nalini Sampath ... N K Srinath
International Journal of Modeling, Simulation, and Scientific Computing | VOL. -
Nalini Sampath, et. al.Nalini Sampath ... N K Srinath
30 Mar 2023
International Journal of Modeling, Simulation, and Scientific Computing | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Herbarium Specimen Identification using Deep Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of TDWG