Abstract

Panoptic segmentation provides a rich 2D environment representation by unifying semantic and instance segmentation. Most current state-of-the-art panoptic segmentation methods are built upon two-stage detectors and are not suitable for real-time applications, such as automated driving, due to their high computational complexity. In this work, we introduce a novel, fast and accurate single-stage panoptic segmentation network that employs a shared feature extraction backbone and three network heads for object detection, semantic segmentation, instance-level attention masks. Guided by object detections, our new panoptic segmentation head learns instance specific soft attention masks based on spatial embeddings. The semantic masks for stuff classes and soft instance masks for things classes are pixel-wise coherent and can be easily integrated in a panoptic output. The training and inference pipelines are simplified and no post-processing of the panoptic output is necessary. Benefiting from fast inference speed, the network can be deployed in automated vehicles or robotic applications. We perform extensive experiments on COCO and Cityscapes datasets and obtain competitive results in both accuracy and time. On the Cityscapes dataset we achieve 59.7 panoptic quality with an inference speed of more than 10 FPS on high resolution 1024 × 2048 images.

Highlights

  • Panoptic segmentation provides a complex understanding of the environment by performing both pixel level and instance level classification

  • Significant progress has been achieved by fast networks, starting from [10], which builds upon a one-stage detector, to [11], where dense bounding boxes are clustered into instance masks using semantic segmentation and the proposal-free method [12], which predicts instance centers and regresses instance center offsets

  • We propose a novel panoptic head that predicts instance-specific soft attention masks based on instance center offsets and object detections; 3

Read more

Summary

Introduction

Panoptic segmentation provides a complex understanding of the environment by performing both pixel level and instance level classification. Significant progress has been achieved by fast networks, starting from [10], which builds upon a one-stage detector, to [11], where dense bounding boxes are clustered into instance masks using semantic segmentation and the proposal-free method [12], which predicts instance centers and regresses instance center offsets. In our fully convolutional architecture, the semantic and instance-aware soft attention masks are pixelwise coherent, a simple multiplication operation can yield the final panoptic segmentation result. We consider that our work has practical importance: the proposed network has a single-stage lightweight architecture, a simplified inference pipeline and does not require merging heuristics for panoptic segmentation. We propose a lightweight fully convolutional architecture for panoptic segmentation based on a single-stage object detector which we extend with novel semantic and panoptic heads; 2. We perform extensive experiments on COCO and Cityscapes datasets, and achieve faster inference speeds and better or on-par accuracy compared to existing methods

Related Work
Panoptic Segmentation Network
Model Architecture
Implementation Details
Experimental Setup
Ablation Studies
Performance on Cityscapes
Method
Performance on COCO
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call