Abstract

Semantic segmentation and depth estimation are two basic researchable problems in computer vision. In common, we explore the two tasks separately. However, in some scenes, such as autonomous driving, they need be done at the same time. Meanwhile, there exists interconnected information between two tasks, which can jointly promote the performances of them. Thus, we explore the two tasks based on multi-task learning to jointly train the tasks and gain predictions together. In this paper, we build Interactive Information Multi-Task Network (IIMT-Net) incorporating the information interactive modules, trained with proposed task-balancing strategy. To be specific, we construct the principal part of encoder and decoder based on Transformer to well capture the global information. For better utilization of the task interaction between two tasks, we also add information fusion modules in two sub-decoders. In addition, the task-balancing strategy, Poly-1 weights, is designed as the balance among samples with different degrees of difficulty to ensure the network won't be biased towards any task severely. The proposed approach's exceptional performance has been extensively showcased through experimental results on the NYU Depth V2 dataset, the Cityscapes dataset, and the SUN RGB-D dataset. Our model can complete the predictions of semantic segmentation task and depth estimation task together and obtain mIoU values of 46.66% on the NYU Depth V2 dataset, 66.37% on the Cityscapes dataset, and 49.89% on the SUN RGB-D dataset, respectively with rmse values of 0.648, 6.630 and 0.401 for depth estimation task, which outperform most existing methods in multi-task learning.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.