Abstract

Multispectral object detection (MOD), which incorporates additional information from thermal images into object detection (OD) to robustly cope with complex illumination conditions, has garnered significant attention. However, existing MOD methods always demand a considerable amount of annotated data for training. Inspired by the concept of few-shot learning, we propose a novel task called few-shot multispectral object detection (FSMOD) that aims to accomplish MOD using only a few annotated data from each category. Specifically, we first design a cross-modality interaction (CMI) module, which leverages different attention mechanisms to interact with the information from visible and thermal modalities during backbone feature extraction. With the guidance of interaction process, the detector is able to extract modality-specific backbone features with better discrimination. To improve the few-shot learning ability of the detector, we also design a semantic prototype metric (SPM) loss that integrates semantic knowledge, i.e., word embeddings, into the optimization process of embedding space. Semantic knowledge provides stable category representation when visual information is insufficient. Extensive experiments on the customized FSMOD dataset demonstrate that the proposed method achieves state-of-the-art performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call