Abstract

Abstract. The paper is devoted to the task of multiple objects tracking and segmentation on monocular video, which was obtained by the camera of unmanned ground vehicle. The authors investigate various architectures of deep neural networks for this task solution. Special attention is paid to deep models providing inference in real time. The authors proposed an approach based on combining the modern SOLOv2 instance segmentation model, a neural network model for embedding generation for each found object, and a modified Hungarian tracking algorithm. The Hungarian algorithm was modified taking into account the geometric constraints on the positions of the found objects on the sequence of images. The investigated solution is a development and improvement of the state-of-the-art PointTrack method. The effectiveness of the proposed approach is demonstrated quantitatively and qualitatively on the popular KITTI MOTS dataset collected using the cameras of a driverless car. The software implementation of the approach was carried out. The acceleration of the procedure for the formation of a two-dimensional point cloud in the found image segment was done using the NVidia CUDA technology. At the same time, the proposed instance segmentation module provides a mean processing time of one image of 68 ms, the embedding and tracking module of 24 ms using the NVidia Tesla V100 GPU. This indicates that the proposed solution is promising for on-board computer vision systems for both unmanned vehicles and various robotic platforms.

Highlights

  • Multiple object tracking (MOT) task is very important for a large number of applications

  • The approach developed in this paper contains the following contributions: - improvements of PointTrack method (Xu et al, 2020) were proposed, which consist in replacing the basic instance segmentation model with the high-speed SOLOv2 model (Wang et al, 2020) and modifying the model that creates embedding for each a found object, taking into account its category; - modification of the Hungarian algorithm was made taking into account a geometric constraint of the found objects on an image sequence; - the software implementation of the approach was carried out, including the procedure acceleration for a two-dimensional point cloud formation in a found image segment using the NVidia CUDA technology

  • Experiments were performed on a workstation with CPU Intel Xeon 6154 32×3GHz, GPU NVidia TeslaV100 32GB

Read more

Summary

Introduction

Multiple object tracking (MOT) task is very important for a large number of applications. Meaning that image recognition methods in 2D are often faster than those in 3D point clouds, we chose instance segmentation on monocular video for object tracking. The approach developed in this paper contains the following contributions: - improvements of PointTrack method (Xu et al, 2020) were proposed, which consist in replacing the basic instance segmentation model with the high-speed SOLOv2 model (Wang et al, 2020) and modifying the model that creates embedding for each a found object, taking into account its category; - modification of the Hungarian algorithm was made taking into account a geometric constraint of the found objects on an image sequence; - the software implementation of the approach was carried out, including the procedure acceleration for a two-dimensional point cloud formation in a found image segment using the NVidia CUDA technology

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call