Abstract

Abstract Sparse R-CNN is a new paradigm of object detection, which predicts objects in a sparse way. However, there are some limitations in Sparse R-CNN. One is the presence of weak prior information caused by fixed learnable proposal boxes and features across different images, necessitating excessive iterations for the model to refine its predictions; the other is the inadequate exploitation of multi-scale information, leading to the sub-optimal detection performance. Thus, building upon Sparse R-CNN, we propose an efficient detector that incorporates dynamic prior and dynamic feature fusion, called $D^{2}$-Det. In particular, for the dynamic prior part, a prior information generator module dynamically generates proposal features and boxes as the dynamic prior for different images to alleviate the inference-inefficient iterative refinement process of predictions, and we further propose the class scores decoupling method to reduce the computation overhead. Furthermore, for the dynamic feature fusion part, we develop a novel lightweight multi-scale feature fusion module, which dynamically aggregates features from all layers for each proposal box, enabling adaptive feature fusion and improving detection precision by nearly 2 AP. Experiments show that $D^{2}$-Det can achieve 46.6 AP on COCO 2017 with fewer computations for the backbone ResNet50, surpassing most of the state-of-the-art detectors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.