Abstract

This letter proposes a novel method to obtain panoptic predictions by extending the semantic segmentation task with a few non-learning image processing steps, presenting the following benefits: (1) annotations do not require a specific format (e.g., COCO); (2) fewer parameters (e.g., single loss function and no need for object detection parameters); and (3) a more straightforward sliding windows implementation for large image classification (still unexplored for panoptic segmentation). Semantic segmentation models do not individualize touching objects, as their predictions can merge, i.e., a single polygon represents many targets. Our method overcomes this problem by isolating the objects using borders on the polygons that may merge. The data preparation requires generating a 1-pixel border, and for unique object identification, we create a list with the isolated polygons, attribute a different value to each one, and use the expanding border (EB) algorithm for those with borders. Although any semantic segmentation model applies, we used the U-Net with three backbones (EfficientNet-B5, EfficientNet-B3, and EfficientNet-B0). The results show that (1) the EfficientNet-B5 had the best results with 70% mIoU; (2) the EB algorithm presented better results for better models; (3) the panoptic metrics show a high capability of identifying things and stuff with 65 Panoptic Quality; and (4) the sliding windows on a 2560x2560 pixel area has shown promising results, in which the ratio of merged objects by correct predictions was lower than 1% for all classes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call