Abstract

Object detection and semantic segmentation are two fundamental techniques of various applications in the fields of Intelligent Vehicles (IV) and Advanced Driving Assistance System (ADAS). Early studies separately handle these two problems. In this paper, inspired by some recent works, we propose a deep neural network model for joint object detection and semantic segmentation. Given an image, an encoder-decoder convolution network extracts a set of feature maps, these feature maps are shared by the detection branch and the segmentation branch to jointly carry out the object detection and semantic segmentation. In the detection branch, we design a PriorBox initialization mechanism to propose more object candidates. In the segmentation branch, we use the multi-scale atrous convolution to explore the global and local semantic information in traffic scenes. Benefiting from the PriorBox Initialization Mechanism (PBIM) and Multi-Scale Atrous Convolution (MSAC), our model presents the competitive performance. In the experiments, we widely compare with several recently-proposed methods on the public Cityscapes dataset, achieving the highest accuracy. In addition, to verify the robustness and generalization of our model, the extension experiments are also conducted on the well-known VOC2012 dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call