Sashimi: A toolkit for facilitating high‐throughput organismal image segmentation using deep learning

Shawn T Schwartz,Michael E Alfaro

doi:10.1111/2041-210x.13712

Abstract

Abstract Digitized specimens are an indispensable resource for rapidly acquiring big datasets and typically must be pre‐processed prior to conducting analyses. One crucial image pre‐processing step in any image analysis workflow is image segmentation, or the ability to clearly contrast the foreground target from the background noise in an image. This procedure is typically done manually, creating a potential bottleneck for efforts to quantify biodiversity from image databases. Image segmentation meta‐algorithms using deep learning provide an opportunity to relax this bottleneck. However, the most accessible pre‐trained convolutional neural networks (CNNs) have been trained on a small fraction of biodiversity, thus limiting their utility. We trained a deep learning model to automatically segment target fish from images with both standardized and complex, noisy backgrounds. We then assessed the performance of our deep learning model using qualitative visual inspection and quantitative image segmentation metrics of pixel overlap between reference segmentation masks generated manually by experts and those automatically predicted by our model. Visual inspection revealed that our model segmented fishes with high precision and relatively few artifacts. These results suggest that the meta‐algorithm (Mask R‐CNN), in which our current fish segmentation model relies on, is well suited for generating high‐fidelity segmented specimen images across a variety of background contexts at rapid pace. We present Sashimi, a user‐friendly command line toolkit to facilitate rapid, automated high‐throughput image segmentation of digitized organisms. Sashimi is accessible to non‐programmers and does not require experience with deep learning to use. The flexibility of Mask R‐CNN allows users to generate a segmentation model for use on diverse animal and plant images using transfer learning with training datasets as small as a few hundred images. To help grow the taxonomic scope of images that can be recognized, Sashimi also includes a central database for sharing and distributing custom‐trained segmentation models of other unrepresented organisms. Lastly, Sashimi includes both auxiliary image pre‐processing functions useful for some popular downstream color pattern analysis workflows, as well as a simple script to aid users in qualitatively and quantitatively assessing segmentation model performance for complementary sets of automatically and manually segmented images.

Full Text