Unifying object detection and re-identification (ReID) into a single network enables faster multi-object tracking (MOT), while this multi-task setting poses challenges for training. In this work, we dissect the joint training of detection and ReID from two dimensions: label assignment and loss function. We find previous works generally overlook them and directly borrow the practices from object detection, inevitably causing inferior performance. Specifically, we identify a qualified label assignment for MOT should: 1) have the assignment cost aware of ReID cost, not just detection cost; 2) provide sufficient positive samples for robust feature learning while avoiding ambiguous positives (i.e., the positives shared by different ground-truth objects). To achieve the above goals, we first propose Identity-aware Label Assignment, which jointly considers the assignment cost of detection and ReID to select positive samples for each instance without ambiguities. Moreover, we advance a novel Discriminative Focal Loss that integrates ReID predictions with Focal Loss to focus the training on the discriminative samples. Finally, we upgrade the strong baseline FairMOT with our techniques and achieve up to 7.0 MOTA / 54.1% IDs improvements on MOT16/17/20 benchmarks under favorable inference speed, which verifies our tailored label assignment and loss function for MOT are superior to those inherited from object detection.
Read full abstract