Abstract

Given a query image of a novel object category at the inference stage, One-Shot Object Detection (OSOD) aims to target the detection towards reference category through the guidance of query image without fine-tuning. It can be widely applied to many realistic applications but remains challenging so far. Existing attention-based models mainly utilize query features to modulate the target branch to finish features retrieval and information propagation, which generally cannot comprehensively exploit context extracted from the only template to mine out co-occurrent object features, also neglect the cross-scale and feature spatial misalignment problems, leading to imprecise results. Observing these problems, we propose an adaptive context- and scale-aware feature aggregation module (ACS), that harnesses global–local context enrichment to fully preserve contextual features, and performs conditioned multi-scale interaction to learn scale-invariant representations. To tackle the spatial misalignment issue between the query image and generated proposals, we leverage the spatial transformer network (STN) to align features, which facilitates the subtask of classification. Extensive experiments on multiple OSOD benchmarks show that our proposed approach significantly outperforms the baseline by a large margin and achieves state-of-the-art results, demonstrating its effectiveness. Meanwhile, the visualization results of geometric semantic matching between query-target image pairs also verify the robustness of our proposed algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.