Abstract

Semantic segmentation is paramount for autonomous vehicles to have a deeper understanding of the surrounding traffic environment and enhance safety. Deep neural networks (DNNs) have achieved remarkable performances in semantic segmentation. However, training such a DNN requires a large amount of labeled data at the pixel level. In practice, it is a labor-intensive task to manually annotate dense pixel-level labels. To tackle the problem associated with a small amount of labeled data, deep domain adaptation (DDA) methods have recently been developed to examine the use of synthetic driving scenes so as to significantly reduce the manual annotation cost. Despite remarkable advances, these methods, unfortunately, suffer from the generalizability problem that fails to provide a holistic representation of the mapping from the source image domain to the target image domain. In this article, we, therefore, develop a novel ensembled DDA to train models with different upsampling strategies, discrepancy, and segmentation loss functions. The models are, therefore, complementary with each other to achieve better generalization in the target image domain. Such a design does not only improves the adapted semantic segmentation performance but also strengthens the model reliability and robustness. Extensive experimental results demonstrate the superiorities of our approach over several state-of-the-art methods.

Highlights

  • A deep neural network (DNN) is powerful for extracting rich hierarchical feature representations [1, 2]

  • This paper develops a parallel generative ensembles method to improve the generalisation of semantic segmentation, where a perception model trained on the data generated by a simulator can generalise in real-world scenarios reliably

  • Multiple generative adversarial network (GAN) models are trained on various discrepancy loss and segmentation loss functions under different upsampling strategies for obtaining diverse predictions

Read more

Summary

Introduction

A deep neural network (DNN) is powerful for extracting rich hierarchical feature representations [1, 2]. The superiority of feature extraction helps DNN based approaches to make compelling achievement on semantic segmentation. Many studies, including U-Net [6] and SegNet [7], have extended the idea of FCN and achieved top-performance in semantic segmentation. These methods require a vast amount of labour-intensive work to label the dense image at pixel level. It takes about one and a half hours to annotate an image from Cityscapes dataset, which is unaffordable for the most of realworld applications

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call