Abstract

Compared to many other dense prediction tasks, object detection plays a fundamental role in visual perception and scene understanding. Dense object detection, aiming at localizing objects directly from the feature map, has drawn great attention due to its low cost and high efficiency. Though it has been developed for a long time, the training pipeline of dense object detectors is still compromised to lots of conjunctions. In this paper, we demonstrate the existence of three conjunctions lying in the current paradigm of one-stage detectors: 1) only samples assigned as positive in classification head are used to train the regression head; 2) classification and regression share the same input feature and computational fields defined by the parallel head architecture; and 3) samples distributed in different feature pyramid layers are treated equally when computing the loss. Based on this, we propose Disentangled Dense Object Detector (DDOD), a simple, direct, and efficient framework for 2D detection with strong performance. We derive two DDOD variants (i.e., DR-CNN, and DDETR) following the basic one-stage/two-stage and recently developed transformer-based pipelines. Specifically, we develop three effective disentanglement mechanisms and integrate them into the current state-of-the-art object detectors. Extensive experiments on MS COCO benchmark show that our approach obtains significant enhancements with negligible extra overhead on various detectors. Notably, our best model reaches 55.4 mAP on the COCO <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">test-dev</i> set, achieving new state-of-the-art performance on this competitive benchmark. Additionally, we validate our model on several challenging tasks including small object detection and crowded object detection. The experimental results further prove the superiority of disentanglement on these conjunctions. Code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/zehuichen123/DDOD</uri> .

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.