Abstract

Learning-based multi-task models have been widely used in various scene understanding tasks, and complement each other, i.e., they allow us to consider prior semantic information to better infer depth. We boost the unsupervised monocular depth estimation using semantic segmentation as an auxiliary task. To address the lack of cross-domain datasets and catastrophic forgetting problems encountered in multi-task training, we utilize existing methodology to obtain redundant segmentation maps to build our cross-domain dataset, which not only provides a new way to conduct multi-task training, but also helps us to evaluate results compared with those of other algorithms. In addition, in order to comprehensively use the extracted features of the two tasks in the early perception stage, we use a strategy of sharing weights in the network to fuse cross-domain features, and introduce a novel multi-task loss function to further smooth the depth values. Extensive experiments on KITTI and Cityscapes datasets show that our method has achieved state-of-the-art performance in the depth estimation task, as well improved semantic segmentation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call