Abstract

Few-shot object detection aims to learn to detect novel objects from only a few annotated samples. Most training frameworks adopt the fusing of high-dimensional features with semantic information on the support images to learn the recognition and localization process of novel objects on the query images. Most prior works directly use a cross-correlation mechanism to integrate semantic information from support features. However, such operations will introduce noise to the query features, confusing the generation of region proposals and affecting the final localization precision. In this paper, we focus on sufficient mining and integrating the support features conducive to generating regional proposals to improve further the stability and accuracy of the few-shot object detector. We propose a cross-attention redistribution (CAReD) module to adaptively integrate support features into query features, effectively removing harmful support features and enhancing the regional features of novel categories. In addition, to classify the novel instances accurately, it is necessary to minimize the intra-class distance while maximizing the inter-class distance. To this end, this paper proposes a network training strategy based on contrastive learning, which can better supervise the training process of CAReD and, more importantly, can effectively improve the classification precision for bounding boxes. Experiments on Pascal VOC and MS-COCO datasets show that CAReD significantly improves upon two baseline detectors (+ 3.6% on Pascal VOC benchmark and + 4.4% on MS-COCO benchmark), achieving state-of-the-art results under few-shot detection settings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call