Bridging the Gap Between Semantic Segmentation and Instance Segmentation

Chengxiang Yin,Zhiyuan Xu,Yanzhi Wang,Jian Tang,Tongtong Yuan

doi:10.1109/tmm.2021.3114541

Abstract

Fine-grained instance segmentation is considerably more complicated and challenging than semantic segmentation. Most existing instance segmentation methods only focus on accuracy without paying much attention to inference latency, which, is critical to real-time applications, such as autonomous driving. In this paper, we aim to bridge the gap between semantic segmentation and instance segmentation by presenting a novel real-time model for instance segmentation, Sem2Ins, which effectively generates instance boundaries according to a semantic segmentation by leveraging conditional generative adversarial networks (cGANs) coupled with deep supervision and a weighted fusion layer. Specifically, supervision is imposed on each output layer, and features from different levels are fused to produce a well-generated boundary map. Sem2Ins has the following desirable features: 1) Combined with some fast semantic segmentation methods, Sem2Ins runs at a real-time speed that is fairly well-balanced against accuracy; 2) Sem2Ins works flexibly with any semantic segmentation model for instance segmentation, and if the given semantic segmentation is sufficiently good, Sem2Ins even achieves state-of-the-art in terms of accuracy; 3) deep supervision and weighted fusion can be leveraged to generate high-quality boundaries; and 4) Sem2Ins can be easily extended to panoptic segmentation. Extensive experiments performed on the Cityscapes, WildDash, KITTI and COCO benchmarks have demonstrated that 1) Sem2Ins, when combined with PSPNet and DDRNet-23-Slim, consistently outperforms the state-of-the-art real-time solution (Box2Pix) in terms of both speed and accuracy; and 2) Sem2Ins combined with DPC performs comparably to some powerful detect-and-segment approaches.

Full Text