Abstract

Multi-object tracking (MOT) has wide applications in the fields of video analysis and signal processing. A major challenge in MOT is how to associate the noisy detections into long and continuous trajectories. In this letter, we address the association problem at the tracklet-level, and mainly focus on the appearance representation designed for tracklets. A multitask convolutional neural network is proposed to learn the discriminative features and spatial-temporal attentions jointly. In particular, we decompose an object in a static image with spatial attentions, and then aggregate multiple features in a tracklet based on the temporal attentions. Appearance misalignment that caused by occlusion and inaccurate bounding is then mitigated by multi-feature aggregation. Experimental results on two challenging MOT benchmarks have demonstrated the effectiveness of the proposed method and shown significant improvement on the quality of tracking identities.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call