Abstract

Semantic segmentation is a well-studied topic and one of the most challenging tasks in computer vision applications, such as autonomous driving. Deep learning approaches based on convolutional neural networks (CNN) have demonstrated exceptional success on this task in recent years. Despite this success, existing approaches are plagued by higher-order inconsistencies between the ground truth images and the ones predicted by the segmentation model. This paper proposes a novel post-processing scheme based on adversarial learning to counter these inconsistencies. Such a scheme can be combined with a variety of existing CNN-based semantic segmentation networks to improve their segmentation performances. The proposed scheme is a Two-Stream Conditional Generative Adversarial Network (TScGAN), with one stream having initial semantic segmentation masks predicted by an existing CNN, while the other stream utilizes scene images to retain high-level information under a supervised residual network structure. In addition, TScGAN incorporates a novel dynamic weighting mechanism, which leads to significant and consistent gains in segmentation performance. Several comparative tests on public benchmark driving databases, including Cityscapes, Mapillary, and Berkeley DeepDrive100K, demonstrate the effectiveness of the proposed method when used with state-of-the-art CNN-based semantic segmentation models. Furthermore, the ablation experiment proved the structural rationality of our two-stream structure. The code for TScGAN could be found at https://github.com/epan-utbm/TScGAN-for-Improving-Semantic-Predictions

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call