M2YOLOF: Based on effective receptive fields and multiple-in-single-out encoder for object detection

Qijin Wang,Yu Qian,Yating Hu,Chao Wang,Xiaodong Ye,Hongqiang Wang

doi:10.1016/j.eswa.2022.118928

Abstract

Object detection under one-level feature is a difficult task, which requires that different scale object representations can be extracted on one feature map, as well as the balance between quality and quantity of positive samples play a key role in model training. YOLOF with real-time detection speed solves the partial problems about object scale and sample quantity balance. To further improve performance especially in smaller objects, we propose a new object detector called M2YOLOF. The main ingredients are a multi-in-single-out encoder that joints attention to strengthen the local feature and global representation of each multi-scale object, and a dynamic sample selection policy that using effective receptive fields to rationalize the quantity of positive samples. M2YOLOF strengthen the contextual details of feature map and balances the rationality of training samples. Extensive experiments on COCO benchmark prove the effectiveness of our method, with an image size of [1333,800], using ResNet50 as backbone, running at 29 FPS on 2080Ti and achieving 39.2 AP. It is 1.7 AP higher than YOLOF but GFLOPs of our method only increases by <9%.

Full Text