Abstract

In this article, we attempt to achieve one-shot object detection by mimicking the human ability to learn new concepts under limited reference, which aims at detecting all object instances of an unseen class in a target image when given a query image of the same unseen class. However, this one-shot learning ability of human benefits from the fact that human brain can quickly extract and process the associated information between the query–target images, which is an issue for the one-shot object detection framework to overcome. Moreover, the feature extraction of the query class in target images is intractable due to the complex and diversified background of remote sensing images. To solve these issues, we propose a solo-to-collaborative dual-attention network (SCoDANet) to hierarchically (image itself/pairs) enhance image feature representations. It consists of three components: 1) solo-attention head that strengthens the compactness of intraclass feature representations of an image and avoids background interference by selectively aggregating the similar features from the spatial and channel dimensions, respectively; 2) dual coattention module that guides RPN to generate an expected set of region proposals related to the query class by mining the coinformation of each query–target feature pair; and 3) nonlinear matching that provides a measure of similarity between the query feature and proposals of the target image to further learn a more robust detector. Our extensive experiments over two benchmarks demonstrate the effectiveness of our method under the one-shot scenario of detecting seen and unseen object categories.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.