Enhanced encoder–decoder architecture for visual perception multitasking of autonomous driving

Muhammad Usman,Qiang Ling,Muhammad Zaka-Ud-Din

doi:10.1016/j.eswa.2024.123249

Abstract

Visual perception plays a vital role in autonomous driving systems, demanding high accuracy and real-time inference speed to ensure safety. In this paper, we propose a multi-task framework that simultaneously performs object detection, drivable area segmentation, and lane line identification, addressing the requirements of accurate and efficient visual perception. Our approach utilizes a shared-encoder architecture with three separate decoders, targeting each specific task. We investigate three configurations for the shared encoder: a Convolutional Neural Network (CNN), a Polyp Vision Transformer (PVT), and a hybrid CNN+PVT model. Through extensive experimentation and comparative analysis on the challenging BD100K dataset, we evaluate the performance of these shared-encoder models and provide valuable insights into their strengths and weaknesses. Our research contributes to the advancement of multi-task visual perception for autonomous driving systems by achieving competitive results in terms of accuracy and efficiency. The source code is publicly available on GitHub to facilitate further research in this domain.

Full Text