Abstract

Benefiting from the capability of night viewing, infrared images has been widely applied to surveillance systems as an effective complement to visible-light images. However, the development of infrared pedestrian detection is still impeded by weak features and limited diversity of infrared images. Aiming at these two problems, we designed a multi-task learning framework for pedestrian detection by incorporating a semantic segmentation branch and a domain adaptation branch. Composed of UNet network with Swin Transformer, the semantic segmentation could apply spatial constraints to pedestrian detection. The domain adaptation branch aligns the features between infrared and visible-light images to improve the scene diversity. In addition, three tasks shared a basic feature extraction network to reduce computation cost. The experiment results show that the average precision (AP) of our method is superior to the EfficientDet network by 2.0% on the XDU-NIR2020 dataset and 2.2% on the CVC-09 dataset respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call