Abstract

Multiple object tracking (MOT) generally employs the paradigm of tracking-by-detection, where object detection and object tracking are executed conventionally using separate systems. Current progress in MOT has focused on detecting and tracking objects by harnessing the representational power of deep learning. Since existing methods always combine two submodules in the same network, it is particularly important that they must be trained effectively together. Therefore, the development of a suitable network architecture for the end-to-end joint training of detection and tracking submodules remains a challenging issue. The present work addresses this issue by proposing a novel architecture denoted as YOLOTracker that performs online MOT by exploiting a joint detection and embedding network. First, an efficient and powerful joint detection and tracking model is constructed to accomplish instance-level embedded training, which can ensure that the proposed tracker achieves highly accurate MOT results with high efficiency. Then, the Path Aggregation Network is employed to combine low-resolution and high-resolution features for integrating textural features and semantic information and mitigating the misalignment of the re-identification features. Experiments are conducted on three challenging and publicly available benchmark datasets and results demonstrate the proposed tracker outperforms other state-of-the-art MOT trackers in terms of accuracy and efficiency.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call