Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks.

Fangzhou Hong,Xinge Zhu,Hongsheng Li,Ziwei Liu,Lingdong Kong,Hui Zhou

doi:10.1109/tpami.2023.3349304

Abstract

With the rapid advances in autonomous driving, it becomes critical to equip its sensing system with more holistic 3D perception. However, widely explored tasks like 3D detection or point cloud semantic segmentation focus on parsing either the objects or scenes. In this work, we propose to address the challenging task of LiDAR-based Panoptic Segmentation, which aims to parse both objects and scenes in a unified manner. In particular, we propose Dynamic Shifting Network (DS-Net), which serves as an effective panoptic segmentation framework in the point cloud realm. DS-Net features a dynamic shifting module for complex LiDAR point cloud distributions. We present an efficient learnable clustering module, dynamic shifting, which adapts kernel functions for different instances. To further explore the temporal information, we extend the single-scan processing framework to its temporal version, 4D-DS-Net, for the task of 4D Panoptic Segmentation, where the same instance across multiple frames should be given the same ID prediction. Instead of naively appending a tracking module to DS-Net, we propose to solve the 4D panoptic segmentation in a more unified way. Specifically, 4D-DS-Net first constructs 4D data volume by aligning consecutive LiDAR scans, upon which the temporally unified instance clustering is performed to obtain the final results. Extensive experiments on two large-scale autonomous driving LiDAR datasets, SemanticKITTI and Panoptic nuScenes, are conducted to demonstrate the effectiveness and superior performance of the proposed solution.

Full Text