Abstract

Object detection under one-level feature is a difficult task, which requires that different scale object representations can be extracted on one feature map, as well as the balance between quality and quantity of positive samples play a key role in model training. YOLOF with real-time detection speed solves the partial problems about object scale and sample quantity balance. To further improve performance especially in smaller objects, we propose a new object detector called M2YOLOF. The main ingredients are a multi-in-single-out encoder that joints attention to strengthen the local feature and global representation of each multi-scale object, and a dynamic sample selection policy that using effective receptive fields to rationalize the quantity of positive samples. M2YOLOF strengthen the contextual details of feature map and balances the rationality of training samples. Extensive experiments on COCO benchmark prove the effectiveness of our method, with an image size of [1333,800], using ResNet50 as backbone, running at 29 FPS on 2080Ti and achieving 39.2 AP. It is 1.7 AP higher than YOLOF but GFLOPs of our method only increases by <9%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.