Extracting Masks from Herbarium Specimen Images Based on Object Detection and Image Segmentation Techniques

Jean-Daniel Zucker,Youcef Sklab,Régine Vignes Lebbe,Edi Prifti,Eric Chenin,Marc Pignal,Hanane Ariouat

doi:10.3897/biss.7.112161

Abstract

Herbarium specimen scans constitute a valuable source of raw data. Herbarium collections are gaining interest in the scientific community as their exploration can lead to understanding serious threats to biodiversity. Data derived from scanned specimen images can be analyzed to answer important questions such as how plants respond to climate change, how different species respond to biotic and abiotic influences, or what role a species plays within an ecosystem. However, exploiting such large collections is challenging and requires automatic processing. A promising solution lies in the use of computer-based processing techniques, such as Deep Learning (DL). But herbarium specimens can be difficult to process and analyze as they contain several kinds of visual noise, including information labels, scale bars, color palettes, envelopes containing seeds or other organs, collection-specific barcodes, stamps, and other notes that are placed on the mounting sheet. Moreover, the paper on which the specimens are mounted can degrade over time for multiple reasons, and often the paper's color darkens and, in some cases, approaches the color of the plants. Neural network models are well-suited to the analysis of herbarium specimens, while making abstraction of the presence of such visual noise. However, in some cases the model can focus on these elements, which eventually can lead to a bad generalization when analyzing new data on which these visual elements are not present (White et al. 2020). It is important to remove the noise from specimen scans before using them in model training and testing to improve its performance. Studies have used basic cropping techniques (Younis et al. 2018), but they do not guarantee that the visual noise is removed from the cropped image. For instance, the labels are frequently put at random positions into the scans, resulting in cropped images that still contain noise. White et al. (2020) used the Otsu binarization method followed by a manual post-processing and a blurring step to adjust the pixels that should have been assigned to black during segmentation. Hussein et al. (2020) used an image labeler application, followed by a median filtering method to reduce the noise. However, both White et al. (2020) and Hussein et al. (2020) consider only two organs: stems and leaves. Triki et al. (2022) used a polygon-based deep learning object detection algorithm. But in addition to being laborious and difficult, this approach does not give good results when it comes to fully identifying specimens. In this work, we aim to create clean high-resolution mask extractions with the same resolution as the original images. These masks can be used by other models for a variety of purposes, for instance to distinguish the different plant organs. Here, we proceed by combining object detection and image segmentation techniques, using a dataset of scanned herbarium specimens. We propose an algorithm that identifies and retains the pixels belonging to the plant specimen, and removes the other pixels that are part of non-plant elements considered as noise. A removed pixel is set to zero (black). Fig. 1 illustrates the complete masking pipeline in two main stages, object detection and image segmentation. In the first stage, we manually annotated the images using bounding boxes in a dataset of 950 images. We identified (Fig. 2) the visual elements considered to be noise (e.g., scale-bar, barcode, stamp, text box, color pallet, envelope). Then we trained the model to automatically remove the noise elements. We divided the dataset into 80% training, 10% validation and 10% test set. We ultimately achieved a precision score of 98.2%, which is a 3% improvement from the baseline. Next, the results of this stage were used as input for image segmentation, which aimed to generate the final mask. We blacken the pixels covered by the detected noise elements, then we used HSV (Hue Saturation Value) color segmentation to select only the pixels with values in a range that corresponds mostly to a plant color. Finally, we applied the morphological opening operation that removes noise and separates objects; and the closing operation that fills gaps, as described in Sunil Bhutada et al. (2022) to remove the remaining noise. The output here is a generated mask that retains only the pixels that belong to the plant. Unlike other proposed approaches, which focus essentially on leaves and stems, our approach covers all the plant organs (Fig. 3). Our approach removes the background noise from herbarium scans and extracts clean plant images. It is an important step before using these images in different deep learning models. However, the quality of the extractions varies depending on the quality of the scans, the condition of the specimens, and the paper used. For example, extractions made from samples where the color of the plant is different from the color of the background were more accurate than extractions made from samples where the color of the plant and background are close. To overcome this limitation, we aim to use some of the obtained extractions to create a training dataset, followed by the development and the training of a generative deep learning model to generate masks that delimit plants.

Full Text