Abstract
Modern autonomous vehicles are required to perform various visual perception tasks for scene construction and motion decision. The multiobject tracking and instance segmentation (MOTS) are the main tasks since they directly influence the steering and braking of the car. Implementing both tasks using a multitask learning neural network presents significant challenges in performance and complexity. Current work on MOTS devotes to improve the precision of the network with a two-stage tracking by detection model, which is difficult to satisfy the real-time requirement of autonomous vehicles. In this article, a real-time multitask network named YolTrack based on one-stage instance segmentation model is proposed to perform the MOTS task, achieving an inference speed of 29.5 frames per second (fps) with slight accuracy and precision drop. The YolTrack uses ShuffleNet V2 with feature pyramid network (FPN) as a backbone, from which two decoders are extended to generate instance segments and embedding vectors. Segmentation masks are used to improve the tracking performance by performing logic AND operation with feature maps, proving that foreground segmentation plays an important role in object tracking. The different scales of multiple tasks are balanced by the optimized geometric mean loss during the training phase. Experimental results on the KITTI MOTS data set show that YolTrack outperforms other state-of-the-art MOTS architectures in real-time aspect and is appropriate for deployment in autonomous vehicles.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Neural Networks and Learning Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.