Abstract
Tracking multiple objects in a video sequence can be accomplished by identifying the objects appearing in the sequence and distinguishing between them. Therefore, many recent multi-object tracking (MOT) methods have utilized re-identification and distance metric learning to distinguish between objects by computing the similarity/dissimilarity scores. However, it is difficult to generalize such approaches for arbitrary video sequences, because some important information, such as the number of objects (classes) in a video, is not known in advance. Therefore, in this study, we applied a one-shot learning framework to the MOT problem. Our algorithm tracks objects by classifying newly observed objects into existing tracks, irrespective of the number of objects appearing in a video frame. The proposed method, called OneShotDA , exploits the one-shot learning framework based on an attention mechanism. Our neural network learns to classify unseen data samples using labels from a support set. Once the network has been trained, it predicts correct labels for newly received detection results based on the set of existing tracks. To analyze the effectiveness of our method, it was tested on the MOTchallenge benchmark datasets (MOT16 and MOT17 datasets). The results reveal that the performance of the proposed method was comparable with those of current state-of-the-art methods. In particular, it is noteworthy that the proposed method ranked first among the online trackers on the MOT17 benchmark.
Highlights
Multi-object tracking (MOT) is considered one of the most challenging problems in computer vision research
We report mostly tracked (MT) objects, mostly lost (ML) objects, the total number of false positives (FP), false negatives (FN), and identity switches (IDsw), and the total number of times a trajectory is fragmented (Frag)
We compare the results to JBNOT [20], which achieved the top rank on the MOT17 dataset in terms of MOT accuracy (MOTA) but with inferior ID-switch performance compared to our tracker (Table 5)
Summary
Multi-object tracking (MOT) is considered one of the most challenging problems in computer vision research. We propose a novel data association strategy called OneShotDA that exploits one-shot learning frameworks such as those in [14]–[16]. In such frameworks, the class of a query sample is determined by the samples in a gallery set. By following the protocol of the one-shot framework, our method classifies a newly received detection result (query sample) into an existing track (gallery set), or vice versa.. We propose a novel data association mechanism called OneShotDA that can classify newly generated detection outputs into existing tracks using one-shot classification. It is noteworthy that the proposed method ranks first among online trackers when evaluated on the MOT17 benchmark
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.