Deep learning-based object detectors often demand abundant annotated data for training. However, in practice, only limited training data are available, making Few-Shot Object Detection (FSOD) an attractive research topic. Existing two-stage proposal-based Faster R-CNN detectors for FSOD struggle to match the performance of models trained on large datasets. We argue that detectors trained with limited samples cannot establish robust comparison-based relationships. Additionally, FSOD methods only explore these relationships during the training phase. To address these issues, we draw inspiration from neuroscience studies and propose Residual Contrast Faster R-CNN (RcFRCN). RcFRCN incorporates two novel customized contrast blocks: a Residual Spatial Contrast Block and a Residual Proposal Contrast Block. These blocks capture cross-spatial and cross-proposal contrast information, enhancing both training and testing phases. We conduct comprehensive experiments on two FSOD benchmarks: PASCAL VOC and MS-COCO. Our RcFRCN achieves a mAP of 21.9 under a 30-shot setting on MS-COCO. It also achieves AP scores of 69.1, 55.8, and 64.0 under 10-shot settings of different splits on PASCAL VOC, respectively. Moreover, we apply RcFRCN on remote sensing and use our contrast blocks for open-vocabulary detection. Experiment results on these tasks also demonstrate the robustness and generalization ability of our methods.
Read full abstract