Although numerous data association methods have been proposed for Multiple Object Tracking (MOT), how to integrate different features in the data association remains an open problem. For instance, over-relying on the motion feature may fail to do the necessary data association when object movements are complicated, while only depending on the appearance feature may lead to incorrect association results when intra-frame objects have similar appearances. To make an improved trade-off between the appearance feature and motion feature, we re-designed the integration of motion and appearance. In our online approach, the location and motion of each object are cast to adaptive searching windows, and within searching windows, matching is only related to the similarity of appearance features. In our offline approach, tracklets generated from our online approach are refined by forming the motion feature as spatiotemporal constraints and utilizing the appearance for clustering. We conduct experiments on multiple MOT datasets from diverse perspectives, including variant motion speed, illumination condition, object categories, etc. , and verify that our method can reach robust performance. Moreover, this method also demonstrates its effectiveness by further improving our previous 1 st place solutions in two CVPR 2020 MOT challenges. • Re-designing a novel integration of motion and appearance to tackle MOT with complicated motions • Reaching robust performance under variant motion speeds, illumination conditions, and object categories • Improving two 1st-place solutions of CVPR’20 WAD MOT Challenge and CVPR’20 MOTS challenge
Read full abstract