Abstract

With the increase in the digitization efforts of herbarium collections worldwide, dataset repositories such as iDigBio and GBIF now have hundreds of thousands of herbarium sheet images ready for exploration. Although this serves as a new source of plant leaves data, herbarium datasets have an inherent challenge to deal with the sheets containing other non-plant objects such as color charts, barcodes, and labels. Even for the plant part itself, a combination of different overlapping, damaged, and intact individual leaves exist together with other plant organs such as stems and fruits, which increases the complexity of leaf trait extraction and analysis. Focusing on segmentation and trait extraction on individual intact herbarium leaves, this study proposes a pipeline consisting of deep learning semantic segmentation model (DeepLabv3+), connected component analysis, and a single-leaf classifier trained on binary images to automate the extraction of an intact individual leaf with phenotypic traits. The proposed method achieved a higher F1-score for both the in-house dataset (96%) and on a publicly available herbarium dataset (93%) compared to object detection-based approaches including Faster R-CNN and YOLOv5. Furthermore, using the proposed approach, the phenotypic measurements extracted from the segmented individual leaves were closer to the ground truth measurements, which suggests the importance of the segmentation process in handling background noise. Compared to the object detection-based approaches, the proposed method showed a promising direction toward an autonomous tool for the extraction of individual leaves together with their trait data directly from herbarium specimen images.

Highlights

  • Herbarium specimen collections present a unique botanical source of information

  • It can be concluded that the proposed semantic segmentationbased approach for the extraction of individual intact leaves is much more efficient and accurate than the existing object detection approaches. This method has four benefits: (1) the use of the semantic segmentation model enables the extraction of individual leaves even while using a weak classifier trained on a binary image with a small dataset; (2) the semantic segmentation model used in the proposed method can be utilized as a pre-processing step for removing visual noise that exists in herbarium specimens before applying classification algorithms as used in [7] or performing feature extraction compared to object detectionbased approaches; (3) the extracted leaves had a uniform white background, which could be an advantage for pre-processing tasks such as segmentation for feature extraction as shown in the result section; and (4) using the proposed method, it becomes possible to automatically extract individual leaves directly from herbarium specimen images

  • As opposed to the proposed method, object detection-based approaches can offer a simple solution for the location and extraction of leaves when the target task does not require precise leaf information such as phenotypic extraction of features from an individual intact leaf

Read more

Summary

Introduction

Herbarium specimen collections present a unique botanical source of information. They are important data sources for new species discoveries, plant evolution reconstruction, and studying the impact of climate change [1,2,3]. The final herbarium sheet image contains specimen with folded, overlapping, and single leaves with the addition of other non-plant objects such as color charts, barcodes, rulers, and labels While these objects are considered useful to botanists and taxonomists, they are treated as noise when applying computer vision techniques to identify certain species [7,8]. Effective extraction of individual leaves from herbarium collections will provide a valuable contribution to botanical research for studies focusing on individual leaves such as [21], making better use of the available specimen images This will prove important by improving the sample size of the species for studies being conducted in tropical regions where there is a high. We curated a new dataset of herbarium specimen images together with their pixel-level ground truth annotation, which can be used for training/testing machine learning techniques

Related Works
Proposed Methodology
Phase 2
Phase 3
Experimental Work
Datasets
Pre-Processing and Training of Semantic Segmentation Model
Comparison with the State-of-the-Art Approaches
Results and Discussions
Evaluation of Semantic Segmentation Model
Evaluation of Single-Leaf Classifier
PROPOSED METHOD
Phenotypic Trait Extraction Process
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call