Abstract

In the field of multi-target tracking, the widely embraced tracking-by-detection paradigm has rapidly progressed with the refinement of detectors and matching techniques. However, the paradigm of joint detection and tracking is relatively limited, and it is difficult to model complex scenes, such as the complexities introduced by camera motion and occlusion. In this work, a hierarchical joint detection and tracking framework is proposed, namely MSPNet. From a temporal concern, a motion-guided feature aggregation module is proposed to address the complexities of multi-frame variations. From a spatial concern, an occlusion-aware head and hierarchical spatial association are proposed to handle the challenges of occlusion. Extensive experiments on MOT challenging benchmarks demonstrate that the MSPNet can effectively reduce false negatives and improve the accuracy of tracking while outperforming a wide range of existing methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.