Abstract

The supervised one-shot multi-object tracking (MOT) algorithms have achieved satisfactory performance benefiting from a large amount of labeled data. However, in real applications, acquiring plenty of laborious manual annotations is not practical. It is necessary to adapt the one-shot MOT model trained on a labeled domain to an unlabeled domain, yet such domain adaptation is a challenging problem. The main reason is that it has to detect and associate multiple moving objects distributed in various spatial locations, but there are obvious discrepancies in style, object identity, quantity, and scale among different domains. Motivated by this, we propose a novel inference-domain network evolution to enhance the generalization ability of the one-shot MOT model. Specifically, we design a spatial topology-based one-shot network (STONet) to perform the one-shot MOT task, where a self-supervision mechanism is employed to stimulate the feature extractor to learn the spatial contexts without any annotated information. Furthermore, a temporal identity aggregation (TIA) module is proposed to assist STONet to weaken the adverse effects of noisy labels in the network evolution. This designed TIA aggregates historical embeddings with the same identity to learn cleaner and more reliable pseudo labels. In the inference domain, the proposed STONet with TIA performs pseudo label collection and parameter update progressively to realize the network evolution from the labeled source domain to an unlabeled inference domain. Extensive experiments and ablation studies conducted on MOT15, MOT17, and MOT20, demonstrate the effectiveness of our proposed model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call