Abstract

The scale feature plays a crucial role in the detector, and existing methods adopt the feature pyramid based on multiple maps. This paper focuses on a single map and proposes an encoder called SFMF which can employ multi-scale feature fusion on a map. One of the crucial techniques underlying SFMF is a fine-grained weighting method that is used to fast discard unneeded pixel channels during the fusion process. YOLOF (you only look one-level feature) with SFMF (single feature map fusion) achieve 38.5 mAP in the ResNet50 and 40.3 mAP in the ResNet101, which improves 0.8 and 0.5 mAP than the baseline, respectively. Meta-ACON is used to auto-learn activate the neurons or not in the backbone. With the Meta-ACON and SFMF, YOLOF can achieve 39.1 and 40.4 mAP, surpassing the baseline by 1.4 and 0.6 mAP on COCO val-dev. In addition, YOLOF with SFMF achieves 54.8 mAP, improving the performance by an absolute 4.9 mAP on the aircraft detection dataset, with a slight sacrificing efficiency (1 FPS) in inference.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call