Abstract

In the field of multi-target tracking, the widely embraced tracking-by-detection paradigm has rapidly progressed with the refinement of detectors and matching techniques. However, the paradigm of joint detection and tracking is relatively limited, and it is difficult to model complex scenes, such as the complexities introduced by camera motion and occlusion. In this work, a hierarchical joint detection and tracking framework is proposed, namely MSPNet. From a temporal concern, a motion-guided feature aggregation module is proposed to address the complexities of multi-frame variations. From a spatial concern, an occlusion-aware head and hierarchical spatial association are proposed to handle the challenges of occlusion. Extensive experiments on MOT challenging benchmarks demonstrate that the MSPNet can effectively reduce false negatives and improve the accuracy of tracking while outperforming a wide range of existing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call