Abstract

Multi-object tracking (MOT) has long been a crucial topic in the field of autonomous driving and security monitoring. With the saturation of the bounding-box-based MOT algorithms in recent years, a new task to track objects with instance segmentation, called multi-object tracking and segmentation (MOTS), provides a finer level of scene understanding and introduces potential improvements in tracking accuracy. In this paper, we introduce a video-based MOTS framework, named DI still Observations to Representations (DIOR). A feature distiller is designed to extract and balance the comprehensive object representations: 1) the temporal distiller aggregates context information for consistency of features and smoothness of prediction longitudinally; 2) the spatial distiller on the target of interest within each bounding box removes ambiguity and irrelevance of background in the learned features. The subsequent tracking steps start with Hungarian matching based on feature similarity and masks continuity, which is efficient and straightforward. In addition, we propose short-term retrieval (STR) and long-term re-identification (re-ID) modules to avoid missing associations due to failures in detection or possible occlusion. Our method achieves state-of-the-art performance in both MOTS20 and KITTI-MOTS benchmarks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.