Abstract

We study the problem of cross-modality person reidentification (ReID) and tracking with dual visible-infrared (VI) cameras, while most existing efforts on tracking-by-detection have been paid on single-modality visible ReID which is inapplicable for poor-light environments. The major difficulties for cross-modality (e.g., VI) ReID stem from the large modality gap between three-channel visible images and one-channel infrared images and such unknown environmental factors as background clutter, occlusions, etc. To tackle these issues, we propose to enrich the diversities of visible and infrared images for intra- and cross-modality matching by using both the channel-aware data augmentation (DA) techniques (e.g., channel exchanged augmentation and random occlusions) and standard DA techniques. On top of these DA techniques, we incorporate ResNet50 and vision transformer (ViT) into the feature extraction backbone network and apply the dynamic weight average (DWA) strategy for learning loss weights by regarding the minimization of identity loss and triplet loss as a multitask learning problem. We then apply the proposed ReID approach for person tracking in the field of interests. The experiments on two public data sets, i.e., RegDB and SYSU-MM01, show that our approach can improve the performance of state-of-the-art rank-1, mAP, and mINP for cross-modality matching. In addition, the experiments on our data set show that tracking by VI-ReID using dual VI cameras can achieve an accuracy of around 0.24 m.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call